[Info-vax] Integrated Databases

Wed May 30 13:17:18 EDT 2018

On 2018-05-30 02:50:09 +0000, Dave Froble said:

> On 5/29/2018 11:48 AM, Stephen Hoffman wrote:
>> On 2018-05-29 14:51:26 +0000, Dave Froble said:
>> 
>>> On 5/29/2018 9:34 AM, Stephen Hoffman wrote:
>>> 
>>>>> PostgreSQL would be a phenom bundle for businesses looking for an 
>>>>> enterprise class database.
>>>> 
>>>> Ayup.   https://www.postgresql.org/docs/10/static/index.html
>>>> 
>>>> But that port has unfortunately been blocked by flaws in the OpenVMS 
>>>> SSIO implementation.
>>> 
>>> Which should be fixable by a rather simple enhancement to the DLM.  I 
>>> suggested such an enhancement to VSI, namely numeric range locking, and 
>>> hope that work on the port has pushed any consideration of the 
>>> suggestion until after the port is completed.  Or, perhaps, NIH is the 
>>> problem.  We'll see.
>> 
>> SSIO and the issues around atomicity are not particularly related to 
>> byte-range locking.
>> http://de.openvms.org/TUD2012/opensource_and_unix_portability.pdf
> 
> Ok, that was a bit interesting.  Apparently I didn't understand the 
> entire problem.
> 
> I do seem to recall someone, perhaps Robert, stating that a partial 
> block read can be done.  He didn't mention partial block writes.  Nor 
> am I all that sure that such would be a good idea.

Though I don't think it's the intended reference, neither HDDs nor SSDs 
support fractional-sector writes.  On any platform.  The issue here is 
that there are I/O caches around all over the place in OpenVMS and in 
the underlying hardware, and the contents of those OpenVMS caches are 
here uncoordinated.  That's a pretty nasty data corruption problem 
lurking here for folks using stream access, unfortunately.

> The first problem I see with the SSIO is that it appears to be 
> implemented in the CRTL.  That is about as disgusting an approach as I 
> can imagine.  If it is to be a VMS capability, (and disk I/O seems to 
> fit into that catagory), then it should be implemented as part of VMS.

Alas, this particular implementation problem largely exists in the XQP 
and a few related underpinnings.  It's in the file system and caching 
and below the layer of the languages and the language RTLs.  The XQP 
contains a good chunk of the coordination necessary for keeping I/O 
consistent and caches coherent across processes on a host and processes 
across a cluster, too.  That coordination is largely managed with the 
DLM, but DLM knows zilch about volumes and volume allocation maps, I/O 
reads and writes, dirty cache, and whatnot.

> Regardless, from what I read, it appears that there is still a need for 
> locking a range of bytes, which may span block boundaries, and may not 
> include whole blocks.  Now, as I see it, locking should be done by the 
> DLM.

As mentioned above, cache coherence can involve locking, yes.

> At this time, the DLM has one type of lock, a lock on a resource name. 
> My idea was to allow multiple types of locks.  A second type might lock 
> a numeric range, x to y.  Some part of the lock would be two 8 byte 
> integers, and for that type additional code to determine if any part of 
> that range is already locked.

Unclear.  Byte-range locking might provide an optimization around cache 
coordination, or it might not.

> Once there are DLM types of locks, while I have no idea at this time 
> what a third type of lock might be, there is the option for additional 
> types of locks.  Sort of flexible, huh?

I can see uses for byte-range locks, entirely separate from this 
particular SSIO case.

> As for the I/O.  There are several concerns.
> 
> First, we're looking at soon having memory for storage, beyond SSDs 
> which mimic disks.  With such, there is no longer a concept of disk 
> blocks.  There would be no temporary data in buffers, all work would be 
> direct to memory.

True.  Though that does depend on the performance differences among the 
byte-addressable storage as compared with cache memory and main memory. 
 We may well encounter memory caches in front of byte-addressable 
non-volatile storage, akin to the processor caches used with main 
memory.    Byte-addressable storage operating the speed of SSD would 
still be very useful, after all.  There'll undoubtedly be some sort of 
memory management or mapping or other layering here around memory 
access protections, as rogue reads and rogue writes would be bad.

> If Postgre is indeed a worthy database product for VMS, and I know 
> absolutely nothing about the product, perhaps it can be modified to 
> conform to disk I/O capabilities of VMS.  This would immediately give 
> it all the current VMS capabilities, clusters, mirrored storage, etc.

Chapter 26 
https://www.postgresql.org/files/documentation/pdf/10/postgresql-10-US.pdf 
has a write-up around what's already available on platforms that 
support PostgreSQL, and the different options and trade-offs involved.

For a number of folks and applications, a PostgreSQL failover to a warm 
standby server with data that's current to within a second and with the 
failover processing requiring under a second is acceptable, and 
PostgreSQL can provide that.  OpenVMS can potentially do better than 
that when configured appropriately and with sufficient hardware, though 
cluster failover will need to be configured to meet that one-second or 
better window, and the OpenVMS apps will themselves have to be written 
for that failover.   And if the available failover or replication 
mechanisms for PostgreSQL don't meet local application needs, the 
PostgreSQL foreign data writer interface can allow far more extensive 
customizations to be implemented.

As for other approaches available on some Linux distros and probably 
also elsewhere, it's a fairly involved OpenVMS hardware and software 
configuration that does better than what replicated and RAID'd NAS or 
what a DRDB configuration provides for developers and applications, for 
instance.

I haven't looked at how the DLM and GFS2 clustering bits in RedHat 
compare with OpenVMS clustering, though there are some definite 
similarities and also some substantial differences.  Probably the 
biggest difference is the lack of shared write to a file, and that's 
not a small difference.

Hadoop is a different approach that VSI has discussed on occasion, 
though there are differences there, too.  Hadoop ties together both 
clustering and the database system, which is a rather different 
approach than OpenVMS folks are likely familiar.

Many folks can work with a one-second window and a warm secondary with 
PostgreSQL.  More than a few OpenVMS folks run those configurations, 
too.  Sometimes even in a cluster configuration.  The implementations 
here trade off local requirements, OpenVMS costs and particularly 
clustering and HBVS and related licenses, that servers reboot far 
faster than they used to, and many other factors.

> Perhaps as the SSIO appears to attempt, additional I/O capabilities 
> should be implemented on VMS.  Not sure this is a good approach, as 
> long as disks are still in use.  Seems to me to be chasing the past, 
> not the future.

SSIO is an (incomplete) approach seeking to provide the expected 
consistency of what's written to disk.

> Note, most of the above comes from the perspective of software 
> architecture, more so that engineering.

OpenVMS itself still adores punched cards.  OpenVMS doesn't support 
streams all that well.

As for OpenVMS and clustering and HBVS and the rest, there's not a 
whole lot of doc on designing and writing and testing apps for 
clustering.  For  backing up and converting RMS files on-line, too.  
Here's a box of parts and pieces, everybody is unfortunately assumed to 
know how the pieces fit together and what the trade-offs are.  About 
the various potential causes of outages, such as ;32767 or ;65535.   
But that's all fodder for another discussion or twenty.

As for failover and PostgreSQL and clustering in general...  Or Hadoop, 
for that matter.  This is all part of what OpenVMS is competing with in 
the current market.  The numbers of folks that need the specific 
features of OpenVMS — outside of the installed base, of course — isn't 
all that large, but it's the turf that VSI marketing is undoubtedly 
aiming, and opening up the numbers of folks that'll be interested in 
the future — and opening up the numbers of folks that can grow into 
these requirements — is where VSI development is certainly headed.  
There are some folks and some apps that can't take a one-second hit.   
There are others that can.   As for the task ahead of VSI marketing in 
this area, prizing folks off of PostgreSQL on RedHat with an offer of 
OpenVMS and Oracle Rdb or Oracle classic — or a competitive port of 
PostgreSQL or a Hadoop implementation, for that matter — won't be an 
easy sale.  This is where the low-end and the low-end business story 
really matters, and OpenVMS doesn't have one of those yet.  The x86-64 
port and the LLVM work and other tasks currently underway will be part 
of that story, and the x86-64 port can (will?) be the first big piece 
of the entry level.  But there's a whole lot more work involved beyond 
the x86-64 port, and SSIO will eventually be part of that.  That work 
at VSI, and eventually at third-party developers.

-- 
Pure Personal Opinion | HoffmanLabs LLC