[Info-vax] Integrated Databases
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Wed May 30 13:17:18 EDT 2018
On 2018-05-30 02:50:09 +0000, Dave Froble said:
> On 5/29/2018 11:48 AM, Stephen Hoffman wrote:
>> On 2018-05-29 14:51:26 +0000, Dave Froble said:
>>
>>> On 5/29/2018 9:34 AM, Stephen Hoffman wrote:
>>>
>>>>> PostgreSQL would be a phenom bundle for businesses looking for an
>>>>> enterprise class database.
>>>>
>>>> Ayup. https://www.postgresql.org/docs/10/static/index.html
>>>>
>>>> But that port has unfortunately been blocked by flaws in the OpenVMS
>>>> SSIO implementation.
>>>
>>> Which should be fixable by a rather simple enhancement to the DLM. I
>>> suggested such an enhancement to VSI, namely numeric range locking, and
>>> hope that work on the port has pushed any consideration of the
>>> suggestion until after the port is completed. Or, perhaps, NIH is the
>>> problem. We'll see.
>>
>> SSIO and the issues around atomicity are not particularly related to
>> byte-range locking.
>> http://de.openvms.org/TUD2012/opensource_and_unix_portability.pdf
>
> Ok, that was a bit interesting. Apparently I didn't understand the
> entire problem.
>
> I do seem to recall someone, perhaps Robert, stating that a partial
> block read can be done. He didn't mention partial block writes. Nor
> am I all that sure that such would be a good idea.
Though I don't think it's the intended reference, neither HDDs nor SSDs
support fractional-sector writes. On any platform. The issue here is
that there are I/O caches around all over the place in OpenVMS and in
the underlying hardware, and the contents of those OpenVMS caches are
here uncoordinated. That's a pretty nasty data corruption problem
lurking here for folks using stream access, unfortunately.
> The first problem I see with the SSIO is that it appears to be
> implemented in the CRTL. That is about as disgusting an approach as I
> can imagine. If it is to be a VMS capability, (and disk I/O seems to
> fit into that catagory), then it should be implemented as part of VMS.
Alas, this particular implementation problem largely exists in the XQP
and a few related underpinnings. It's in the file system and caching
and below the layer of the languages and the language RTLs. The XQP
contains a good chunk of the coordination necessary for keeping I/O
consistent and caches coherent across processes on a host and processes
across a cluster, too. That coordination is largely managed with the
DLM, but DLM knows zilch about volumes and volume allocation maps, I/O
reads and writes, dirty cache, and whatnot.
> Regardless, from what I read, it appears that there is still a need for
> locking a range of bytes, which may span block boundaries, and may not
> include whole blocks. Now, as I see it, locking should be done by the
> DLM.
As mentioned above, cache coherence can involve locking, yes.
> At this time, the DLM has one type of lock, a lock on a resource name.
> My idea was to allow multiple types of locks. A second type might lock
> a numeric range, x to y. Some part of the lock would be two 8 byte
> integers, and for that type additional code to determine if any part of
> that range is already locked.
Unclear. Byte-range locking might provide an optimization around cache
coordination, or it might not.
> Once there are DLM types of locks, while I have no idea at this time
> what a third type of lock might be, there is the option for additional
> types of locks. Sort of flexible, huh?
I can see uses for byte-range locks, entirely separate from this
particular SSIO case.
> As for the I/O. There are several concerns.
>
> First, we're looking at soon having memory for storage, beyond SSDs
> which mimic disks. With such, there is no longer a concept of disk
> blocks. There would be no temporary data in buffers, all work would be
> direct to memory.
True. Though that does depend on the performance differences among the
byte-addressable storage as compared with cache memory and main memory.
We may well encounter memory caches in front of byte-addressable
non-volatile storage, akin to the processor caches used with main
memory. Byte-addressable storage operating the speed of SSD would
still be very useful, after all. There'll undoubtedly be some sort of
memory management or mapping or other layering here around memory
access protections, as rogue reads and rogue writes would be bad.
> If Postgre is indeed a worthy database product for VMS, and I know
> absolutely nothing about the product, perhaps it can be modified to
> conform to disk I/O capabilities of VMS. This would immediately give
> it all the current VMS capabilities, clusters, mirrored storage, etc.
Chapter 26
https://www.postgresql.org/files/documentation/pdf/10/postgresql-10-US.pdf
has a write-up around what's already available on platforms that
support PostgreSQL, and the different options and trade-offs involved.
For a number of folks and applications, a PostgreSQL failover to a warm
standby server with data that's current to within a second and with the
failover processing requiring under a second is acceptable, and
PostgreSQL can provide that. OpenVMS can potentially do better than
that when configured appropriately and with sufficient hardware, though
cluster failover will need to be configured to meet that one-second or
better window, and the OpenVMS apps will themselves have to be written
for that failover. And if the available failover or replication
mechanisms for PostgreSQL don't meet local application needs, the
PostgreSQL foreign data writer interface can allow far more extensive
customizations to be implemented.
As for other approaches available on some Linux distros and probably
also elsewhere, it's a fairly involved OpenVMS hardware and software
configuration that does better than what replicated and RAID'd NAS or
what a DRDB configuration provides for developers and applications, for
instance.
I haven't looked at how the DLM and GFS2 clustering bits in RedHat
compare with OpenVMS clustering, though there are some definite
similarities and also some substantial differences. Probably the
biggest difference is the lack of shared write to a file, and that's
not a small difference.
Hadoop is a different approach that VSI has discussed on occasion,
though there are differences there, too. Hadoop ties together both
clustering and the database system, which is a rather different
approach than OpenVMS folks are likely familiar.
Many folks can work with a one-second window and a warm secondary with
PostgreSQL. More than a few OpenVMS folks run those configurations,
too. Sometimes even in a cluster configuration. The implementations
here trade off local requirements, OpenVMS costs and particularly
clustering and HBVS and related licenses, that servers reboot far
faster than they used to, and many other factors.
> Perhaps as the SSIO appears to attempt, additional I/O capabilities
> should be implemented on VMS. Not sure this is a good approach, as
> long as disks are still in use. Seems to me to be chasing the past,
> not the future.
SSIO is an (incomplete) approach seeking to provide the expected
consistency of what's written to disk.
> Note, most of the above comes from the perspective of software
> architecture, more so that engineering.
OpenVMS itself still adores punched cards. OpenVMS doesn't support
streams all that well.
As for OpenVMS and clustering and HBVS and the rest, there's not a
whole lot of doc on designing and writing and testing apps for
clustering. For backing up and converting RMS files on-line, too.
Here's a box of parts and pieces, everybody is unfortunately assumed to
know how the pieces fit together and what the trade-offs are. About
the various potential causes of outages, such as ;32767 or ;65535.
But that's all fodder for another discussion or twenty.
As for failover and PostgreSQL and clustering in general... Or Hadoop,
for that matter. This is all part of what OpenVMS is competing with in
the current market. The numbers of folks that need the specific
features of OpenVMS — outside of the installed base, of course — isn't
all that large, but it's the turf that VSI marketing is undoubtedly
aiming, and opening up the numbers of folks that'll be interested in
the future — and opening up the numbers of folks that can grow into
these requirements — is where VSI development is certainly headed.
There are some folks and some apps that can't take a one-second hit.
There are others that can. As for the task ahead of VSI marketing in
this area, prizing folks off of PostgreSQL on RedHat with an offer of
OpenVMS and Oracle Rdb or Oracle classic — or a competitive port of
PostgreSQL or a Hadoop implementation, for that matter — won't be an
easy sale. This is where the low-end and the low-end business story
really matters, and OpenVMS doesn't have one of those yet. The x86-64
port and the LLVM work and other tasks currently underway will be part
of that story, and the x86-64 port can (will?) be the first big piece
of the entry level. But there's a whole lot more work involved beyond
the x86-64 port, and SSIO will eventually be part of that. That work
at VSI, and eventually at third-party developers.
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list