[Info-vax] Roadmap

Mon Jan 7 13:19:42 EST 2019

On 2019-01-06 21:22:26 +0000, Dave Froble said:

> As for data stored in files, it's a different problem.  The issue is 
> whether one can accurately retrieve data.
> 
> One solution might be some type of identifier for each piece of 
> floating point data, which would be respected by whatever code moved 
> data from file to memory, and back.  I'd expect some type of 
> performance hit from such a scheme.

That's heading closer to object storage.  On OpenVMS, a descriptor 
might be chosen, given the lack of support for objects.  There are 
descriptors for different sorts of data, beyond the ASCII-oriented 
string "fun" that can be familiar to OpenVMS developers.   And yes, 
there's overhead involved here in the serialization and 
deserialization.  There are various factors obviously lurking, around 
frequency of access and performance requirements and storage and... all 
the usual sorts of design and implementation trade-offs.

json, yaml, xml, tlv or csv file storage would be typical text archival 
and transportable formats in recent years, or—depending on what the 
particular requirements might be—a text file of SQL or database-related 
commands, and where the overhead of an HDD I/O request can mask a whole 
lot of overhead from serialization/marshalling/export related 
text-binary conversion processing.

I'd steer away from tlv and definitely away from csv, but those can work.

And it's not just tagging floating point here, as integer sizes and 
character encoding defaults and languages can vary.  OpenVMS and most 
apps just don't use or don't consider UTF-8 encoding, and similarly 
don't often consider language-specific sorting and related "fun".

Marshalling and unmarshalling APIs are commonly available on various 
(other) platforms.  Some with opaque storage, some with transportable 
storage, some with both.

This whole area can also be quite problematic around security too, 
particularly whenever untrusted data is being 
deserialized/imported/unmarshalled.  Simon's CVE is far from the only 
example of security bugs secondary to parsing errors or incautiousness.

Where performance is a factor, binary storage formats are usually 
preferable.  But that's usually part of the database, and not part of 
the archival processing.  SQLite is handy here for some cases too, as 
the database format is transportable across platforms.

-- 
Pure Personal Opinion | HoffmanLabs LLC