[Info-vax] VMS databases

Jake Hamby (Solid State Jake) jake.hamby at gmail.com
Mon Nov 20 21:52:18 EST 2023


On Monday, November 20, 2023 at 4:19:04 PM UTC-8, Arne Vajhøj wrote:
> On 11/20/2023 11:50 AM, Stephen Hoffman wrote: 
> > On 2023-11-19 22:22:37 +0000, Jan-Erik Söderholm said: 
> >> Den 2023-11-19 kl. 18:37, skrev Stephen Hoffman: 
> >>> As with many things in IT, that depends. 
> >>> 
> >>> Expectations and related sizes can also differ. What can be 
> >>> considered a small database for SQLite can potentially be considered 
> >>> a large database for OpenVMS, for instance. 
> >>> 
> >>> Expectations? SQLite tops out at 256 TiB databases, while OpenVMS 
> >>> file storage tops out at 2 TiB files absent 'heroic' efforts. 
> >> 
> >> A single Rdb database can have 8192 storage areas (individual files) 
> >> so the max database size would be something like 16.000 TiB. 
> > 
> > So... Heroic _and_ expensive.
> I believe that people would: 
> * switch from SQLite to a database server way before 256 TB 
> * switch from Rdb to Wide Column Data Store NoSQL database (like 
> HBase or Cassandra/ScyllaDB) way before 16 PB 
> 
> Arne

Those are very reasonable expectations. It's not impossible that Rdb can scale to extremely large sizes despite the limits of individual RMS volumes. Sometimes my mind boggles at the knowledge, not very well-known, and quite unnerving to think about, that all the world's banking, airline reservations, income tax returns, and everything else on mainframes is being held on volumes that are multiples of the size of an IBM 3390 disk drive circa 1990.

As of 2021, in IBM's DS8880 mainframe storage array:
https://www.ibm.com/docs/en/ds8880/8.5.4?topic=features-extended-address-volumes-ckd

"Count key data (CKD) volumes now support the additional capacity of 1 TB. The 1 TB capacity is an increase in volume size from the previous 223 GB.

You can create a 1 TB IBM Z CKD volume. An IBM Z CKD volume is composed of one or more extents from a CKD extent pool. CKD extents are 1113 cylinders in size. When you define a IBM Z CKD volume, you must specify the number of cylinders that you want for the volume. The storage system and the z/OS have limits for the CKD EAV sizes. You can define CKD volumes with up to 1,182,006 cylinders, about 1 TB on the DS8880."

Just think: you get EBCDIC and track/cylinder/block sizing in odd multiples. Yet they get by somehow, chaining together thousands and thousands of emulated mainframe volumes. Anything VMS does that's weird is downright straightforward compared to System/360 legacy I/O standards. IBM has thousands of patents on increasingly complicated ways of making it all go fast.

As a kid, I read all the details in "Inside Commodore DOS" about all the weird features built into the Commodore 1541 disk drive, include the "relative" file format that almost nobody used (GEOS reused the file type for its own format that the disk drive ROM didn't know the details of, so you couldn't run "Validate" or the drive would free up sectors that were actually in use by the GEOS files). The relative format was fixed-length records up to 254 bytes in length, which you could read and write to in random order, rather than only being able to write sequentially as the "SEQ" file type would indicate. Most programmers either loaded everything to RAM or used direct track/sector addressing (perhaps with copy protection) because the firmware was so limited, including the notoriously slowed-down serial I/O to work around an early C64 motherboard misdesign preventing the use of interrupt-driven I/O.

Conversely, VMS's record-management features appear to be quite useful for the COBOL, Fortran, BASIC, BLISS, MACRO, etc. programs that co-evolved with the OS. Beyond the scale of the built-in record I/O features of those languages, you moved to Rdb or CODASYL. That's another 1980s DEC database product that Oracle still sells support for that I'd guess is even less well-known than Rdb.

The metadata for records and the record management of RMS are increasingly uninteresting these days. What has to be fast are the fundamental features of any filesystem: access control, logical block mapping to physical disk extents, and the other block-level features. It's also more important that it be reliable, secure, and consistent more than anything else. ODS-5 is fairly ponderous and isn't trying to do anything too clever (directory entries are alphabetized and searched in O(log n) rather than some fancy B-tree structure that's also O(log n). Multiple backup copies of all the important data structures. As a developer, you can make reasonable assumptions about what it is and isn't going to do with your data, when. It's not going to be especially competitive for some of the use cases that ext4 or NTFS or some other journaled filesystem is faster at.

>From looking at the libuv source code, Linux has only very recently evolved proper high-speed async I/O, with real-world code having to check for specific kernel versions and feature flags to use the new io_uring properly. That's an API that you can observe evolving in real time. The latest optimization I read about is you can now give the kernel a buffer pool that it can choose from to put incoming packets, so that the programmer doesn't have to manage which buffer slot to put which async submission result. They're probably chasing microbenchmark results and increasingly diminishing returns there.

While NT, Solaris, BSD, and any other OS with poll() or some variant has non-blocking TCP/IP socket I/O in libuv, only Linux io_uring seems to have gone the distance and enabled callers to issue the entire gamut of fs operations: mkdir, delete, fstat, symlink, etc. as async I/O requests. Every other OS, including Win32, has to use a thread pool to do at least some of those synchronously, especially for the filesystem. If it turned out that being able to do those kinds of fs ops asynchronously is useful, Linux and VMS would both have an advantage. I'm guessing that socket multiplexing is the more important thing to get right first, so I'm going to look at that before the filesystem.

Cheers,
Jake



More information about the Info-vax mailing list