[Info-vax] Roadmap
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Mon Jan 7 13:19:42 EST 2019
On 2019-01-06 21:22:26 +0000, Dave Froble said:
> As for data stored in files, it's a different problem. The issue is
> whether one can accurately retrieve data.
>
> One solution might be some type of identifier for each piece of
> floating point data, which would be respected by whatever code moved
> data from file to memory, and back. I'd expect some type of
> performance hit from such a scheme.
That's heading closer to object storage. On OpenVMS, a descriptor
might be chosen, given the lack of support for objects. There are
descriptors for different sorts of data, beyond the ASCII-oriented
string "fun" that can be familiar to OpenVMS developers. And yes,
there's overhead involved here in the serialization and
deserialization. There are various factors obviously lurking, around
frequency of access and performance requirements and storage and... all
the usual sorts of design and implementation trade-offs.
json, yaml, xml, tlv or csv file storage would be typical text archival
and transportable formats in recent years, or—depending on what the
particular requirements might be—a text file of SQL or database-related
commands, and where the overhead of an HDD I/O request can mask a whole
lot of overhead from serialization/marshalling/export related
text-binary conversion processing.
I'd steer away from tlv and definitely away from csv, but those can work.
And it's not just tagging floating point here, as integer sizes and
character encoding defaults and languages can vary. OpenVMS and most
apps just don't use or don't consider UTF-8 encoding, and similarly
don't often consider language-specific sorting and related "fun".
Marshalling and unmarshalling APIs are commonly available on various
(other) platforms. Some with opaque storage, some with transportable
storage, some with both.
This whole area can also be quite problematic around security too,
particularly whenever untrusted data is being
deserialized/imported/unmarshalled. Simon's CVE is far from the only
example of security bugs secondary to parsing errors or incautiousness.
Where performance is a factor, binary storage formats are usually
preferable. But that's usually part of the database, and not part of
the archival processing. SQLite is handy here for some cases too, as
the database format is transportable across platforms.
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list