[Info-vax] Portable OpenVMS binary data format?
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Thu Aug 9 12:21:08 EDT 2018
On 2018-08-09 08:48:07 +0000, John E said:
>>
>> You have been exceedingly stingy at providing details such as the>
>> volume of data involved,
> Because it's not one use case I'm considering, it's a multitude of use
> cases (with both large and small datasets, and some stuff that only
> needs to be transferred one time and in one direction and other stuff
> that needs to be transferred multiple times and in both directions). I
> just wanted some general guidelines and info and am very grateful to
> you and others for the info.
>
>>> E.g. you can process and whittle down a large Stata dataset on a huge
>>> linux machine with tons of disk & memory, then download the smaller
>>> dataset to you laptop for continued work. Super convenient and
>>> Something to aspire to? Well, that's a pretty bad design, actually.
>> Primitive. Clumsy. I'd expect these and other tools — and the case
>> you're working on here — would work as well with remote access to text
>> or binary data over SSL, without having to transfer the files around.
>
> I'm not going to argue this but don't really see what point you're
> trying to make. In my mental model, SAS & stata data sets, HDF files,
> and CDF files are all trying to do more or less the same thing:
> provide portable binary formats with metadata. I don't understand why
> this is OK for CDF but not SAS or stata? But I also only vaguely
> understand CDF as being similar to HDF which I'm more familiar with.
>
> And I'm also not a huge lover of the SAS & stata walled-gardens
> approach, but if you're stuck in those lovely gardens the portable data
> formats are a nice feature, at least.
I'm referring to the whole approach. It's primitive and clumsy. Now
it will definitely work once the bugs are resolved, so there's that.
Why clumsy? You're effectively asking for help creating a bespoke
binary backup tool here as part of your bespoke database, and then
planning to use that tool for interchange. If you're interested or
want or need to do that, have at.
Maybe have a look at CDF or some other standard format, and reduce the
amount of code and ease the transportability of the data. Bespoke
formats means you get to deal (more) with other folks wanting to access
the data.
When looking at live data, then a network connection into the server
with YAML or XML or ilk via HTTPS, or a bespoke socket with whatever
text or binary format via TCP or SSL. DEC was offering developers this
sort of schtick decades ago, with the LiveLink package; it's not a
particularly new approach. Microsoft, SAS, SAP and others now provide
that same sort of online access. Either vacuum the database and
effectively export the whole thing directly into a spreadsheet (without
the file and without the FTP, and preferably with TLS and credentials),
or the remote client can access the live data.
The difference in approaches? It's the difference between using backup
floppies via the network, and using the network. It's the difference
between reading a whole data dump or complete export, and either the
live data or a delta of the changes since the last transfer. Of seeing
the actual data, or the data from the last export-import.
Pragmatically, you're basically working toward writing your own
database. Which gets old. But that's what you want to do, so have at.
In expending that effort, you'll learn about tools such as xxd and
DUMP and ilk, and about little-endian and big-endian storage, about VAX
and IEEE float, and a variety of other topics. As you get further
along in your bespoke database and as your app requirements evolve,
you'll potentially also be learning about sockets and related
operations, maybe transactions, as well as security.
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list