[Info-vax] Portable OpenVMS binary data format?

Thu Aug 9 12:21:08 EDT 2018

On 2018-08-09 08:48:07 +0000, John E said:

>> 
>> You have been exceedingly stingy at providing details such as the> 
>> volume of data involved,
> Because it's not one use case I'm considering, it's a multitude of use 
> cases (with both large and small datasets, and some stuff that only 
> needs to be transferred one time and in one direction and other stuff 
> that needs to be transferred multiple times and in both directions).  I 
> just wanted some general guidelines and info and am very grateful to 
> you and others for the info.
> 
>>> E.g. you can process and whittle down a large Stata dataset on a huge 
>>> linux machine with tons of disk & memory, then download the smaller 
>>> dataset to you laptop for continued work.  Super convenient and 
>>> Something to aspire to?  Well, that's a pretty bad design, actually.
>> Primitive.  Clumsy.  I'd expect these and other tools — and the case 
>> you're working on here — would work as well with remote access to text 
>> or binary data over SSL, without having to transfer the files around.
> 
> I'm not going to argue this but don't really see what point you're 
> trying to make.  In my mental model, SAS & stata data sets, HDF files, 
> and CDF files are all trying to do more or less the same thing:  
> provide portable binary formats with metadata.  I don't understand why 
> this is OK for CDF but not SAS or stata?  But I also only vaguely 
> understand CDF as being similar to HDF which I'm more familiar with.
> 
> And I'm also not a huge lover of the SAS & stata walled-gardens 
> approach, but if you're stuck in those lovely gardens the portable data 
> formats are a nice feature, at least.

I'm referring to the whole approach.  It's primitive and clumsy.  Now 
it will definitely work once the bugs are resolved, so there's that.  
Why clumsy?  You're effectively asking for help creating a bespoke 
binary backup tool here as part of your bespoke database, and then 
planning to use that tool for interchange.  If you're interested or 
want or need to do that, have at.

Maybe have a look at CDF or some other standard format, and reduce the 
amount of code and ease the transportability of the data.   Bespoke 
formats means you get to deal (more) with other folks wanting to access 
the data.

When looking at live data, then a network connection into the server 
with YAML or XML or ilk via HTTPS, or a bespoke socket with whatever 
text or binary format via TCP or SSL.  DEC was offering developers this 
sort of schtick decades ago, with the LiveLink package; it's not a 
particularly new approach.  Microsoft, SAS, SAP and others now provide 
that same sort of online access.   Either vacuum the database and 
effectively export the whole thing directly into a spreadsheet (without 
the file and without the FTP, and preferably with TLS and credentials), 
or the remote client can access the live data.

The difference in approaches?  It's the difference between using backup 
floppies via the network, and using the network.  It's the difference 
between reading a whole data dump or complete export, and either the 
live data or a delta of the changes since the last transfer.  Of seeing 
the actual data, or the data from the last export-import.

Pragmatically, you're basically working toward writing your own 
database.  Which gets old.  But that's what you want to do, so have at. 
  In expending that effort, you'll learn about tools such as xxd and 
DUMP and ilk, and about little-endian and big-endian storage, about VAX 
and IEEE float, and a variety of other topics.  As you get further 
along in your bespoke database and as your app requirements evolve, 
you'll potentially also be learning about sockets and related 
operations, maybe transactions, as well as security.

-- 
Pure Personal Opinion | HoffmanLabs LLC