[Info-vax] Large mailboxes - RMS Indexed file internal design

Sat Dec 5 12:02:02 EST 2020

On Friday, December 4, 2020 at 3:41:06 PM UTC-5, Dave Froble wrote:
> On 12/4/2020 11:22 AM, Michael Moroney wrote: 
> > On 12/3/2020 11:41 AM, Hein RMS van den Heuvel wrote: 

> > The final  "fix" was a DCL procedure to replace the file with a fresh one during  "maintenance" times. 
Well done Michael.

> > It just bugged me that simple repeated RMS put/get/delete sequences worked out like this. 
:
> > But I didn't know the difference between an RMS bucket and a pail. 

That made me smile.
A pail does not have an associated key value, other then that they are pretty much the same.
They both hold stuff; smaller pails are lighter and more nimble but you need more of them; bigger stuff needs bigger pails; there is a limit to the size of a pail you can get.
:-)

> > Is there a reason why the equivalent of CONVERT/RECLAIM can't be run for one 
> > bucket only when it becomes empty/unusable? 

It's probably the old OpenVMS adage... if you can't fix a problem 100% for 100% of the anticipated case then do not bother.
I've always disliked that approach. I lean towards welcoming solutions which fix 90% of the problem 90% of the time.
Convert(/reclaim) currently just operates on-disk.
If would be wonderful if notably reclaim would have an online variant even if that was limit to 'data level' only.
To work online one would either have to add (a) new RMS call(s), or the standalone utility would need to run with CMEXEC privs and learn to take out bucket and area locks. To avoid online locking conflicts a standalone tool could just try to gather all the locks it needs (empty bucket, bucket pointing to it, area to return the bucket to) and be willing to just release, go to sleep, and try again when a blocking ast is triggered.
I suspect the original engineers were concerned with applications holding on to RFA's or Fast deletes leaving RRVs in secondary indexes leading to false positive reads when a bucket and record-id is recycled into usage.

> It begins with a fundamental design issue in RMS, at least in my opinion. 
> 
> Data records in an RMS indexed file are ordered by the primary key. 
> Thus any activity that adds or deletes data records must affect that  ordered location. 

Dave, indeed that could have been other/better choices. 
Overall RMS held up pretty nicely over 40 years and from a few megabytes being a big file to tens, hundreds, of gigabytes in a file. 
Production RMS files now are 1000 times bigger than folks expected back in RMS-11 days.

nitpicking - RMS itself does NOT support changing primary keys. Applications have to code that themselves with delete + insert as you write. (Only) The Cobol RLT hides this for the end user.

> Then secondary keys are not on a one to one correspondence with data 
> records, but rather one key for each unique key and a list of pointers 
> for each data record with the same key value. 

Ok, but I see that just as extreme key compression.

> In the competing database I've been involved with, with design going 
> back to 1974, each data record has unique key records for each key 
> structure on a one for one basis. 

The record RFA ( = RRV ) can be consider that key to some degree.  
It is a GUID of sorts, just too bad they (understandably) picked 16 bits for the ID at the time.
The 32 bit VBN is of course also once such unfortunate picks at the time.
Both are hard to fix as they are externally exposed in the RFA. 
The 63 block bucket size limit is just an internal 16 bit buffer size limit.
It was equally understandable coming from RMS-11 days,  but it should have been fixed during the Alpha port (just before my time :-).

Rambling on a Snowy day in Nashua, NH.
Hein.