[Info-vax] C... the only winning move is not to play...
Hein RMS van den Heuvel
heinvandenheuvel at gmail.com
Wed Feb 19 14:57:45 EST 2014
On Tuesday, February 18, 2014 7:24:50 PM UTC-5, David Froble wrote:
> Hein RMS van den Heuvel wrote:
:
> > Indexed files (expect RSX PROLOGUE-1) have the primary key moved to the front of the record and may be compressed.
> This is something new to me. Can you be a bit more specific?
Not really... it says it all.
1) The Index structure points to DATA BUCKETS with multiple records in a key range.
2) With the data bucket there is no table of content or some such. Records are order by primary key.
3) To find a requested target record (by key value) RMS will have to compare (expanded) primary keys... but only keys.
4) It would be silly to have to expand the whole record to find the primary key bits.
5) Typical data patterns allow for better compression for keys than data. For data RMS just consolidate repeating characters. For keys RMS uses lead compression based on the prior key, and repeating character tail compression based on the known total key size.
So the key in its entirety is lifted from the original place (often byte 0), Optionally compressed, and stored, and immediately followed by the segments around the key fields (often just the bytes following it) and that data has its open (optional) compression. But is stays one record, one variable length 'blob', with one RFA and one flag byte.
JFM> This is new to me ! So if my primary key is defined as being at offset
24 for 4 bytes, on disk it is actually stored at offset 0 for 4 bytes ?
Yes. Just DUMP a data bucket and check!
JFM> the user is given a pointer to the actual start of the record (after the copy of primary key)
NO. RMS re-assembles the record from its on-disk format into a record buffer (IRAB pointer RECADDR)and then copies that record buffer to a user specified address (RAB pointer UBF)
David> My understanding is that the secondary keys have as the data portion of
the key a pointer to the actual data record,
Yes.
David> and I believe that with duplicate keys, there is one key and a list of associated data record pointers.
Yes, array-of RFA + Flag following key bytes. Fresh key+array for continuation buckets.
David> Just as a point of discussion, the DAS database product I implimented 30
years ago treated all keys the same, the key and a pointer.
Sure. Many 'database' work that way. RMS does not.
At the time it was designed it was deemed more/most important to be able to scan a file in primary key order.
If also can reduce an b-tree index layer, as each index end-node points to several (typically 3 - 50 ) data record. Assuming a fixed data bucket size, an following an index pointer would always read a bunch of records, so you might as well derive value from that.
David> Data records could be anywhere, and keys could be changed without actually
moving the data record.
Scatter/Gather.
Typical RMS usages proves the right choice was made. Records are frequently, dominantly, read in order. Many IOs were saved over the decades.
>>A better idea, I thought, than the RMS requirement of deleting the old record and adding a new record if changing the primary key, since the data record location is dependent upon the primary key value.
Typical RMS usages proves the right choice was made.
You average RMS usage has zero desire to change a primary key... and as you outline there is a workaround in DEL+PUT
There is an other reason for NOT storing the records in primary key order, and that reason would be avoid (data) BUCKET SPLITS. If you can just 'add to the end' adding records gets easier... As long as the data is securely written out, you can take some freedom updating the index. But the contention for 'the end' may make things harder in turn ( if, as often is done, a single last bucket and write-target is used).
Cheers,
Hein
More information about the Info-vax
mailing list