[Info-vax] wrong file format

Thu Dec 31 17:29:53 EST 2020

On 2020-12-31 11:07:20 +0000, Dirk Munk said:

> Bill Gunshannon wrote:
>> On 12/30/20 7:59 AM, Dirk Munk wrote:
>>> Jan-Erik Söderholm wrote:
>>>> Den 2020-12-29 kl. 23:05, skrev Dirk Munk:
>>>>> No, of course Unix is not immune. Using <lf> or <cr> (Windows) as 
>>>>> record terminator is a rather silly idea. It means that you can't use 
>>>>> those characters in a record, and you have to scan the contents of a 
>>>>> file for those characters. Simply writing the length of a record at the 
>>>>> beginning of that record is far better solution.
>>>> 
>>>> Having a <LF> or a <CR> in text files seems rather logical to me. What 
>>>> else, if you want either a line feed or a carriage return?
>>>> 
>>>> But yes, there are other ways to specify and delimiting a "line of 
>>>> text", if you have a system suporting that.
>>>> 
>>>> Now, if that "record" is something else than a "line of text"...
>>> 
>>> The problem is that in Unix and Windows land there is no difference 
>>> between the metadata of a file, and the actual contents of a file. The 
>>> metadata should define the file and the records in the file, that 
>>> should be completely separate from the actual data contents of the file.
>> 
>> Can't speak for Windows, but Unix has no meta-data. Unix has only one 
>> file type, a stream of bytes.  Everything else is application layer.
> 
> Which means you don't have a clue about the contents of a file, until 
> you know the internals of the application. Standard VMS applications 
> produce structured files, so you only have to worry about the data 
> contents. It is possible to write your own applications using the files 
> of another application. The application can be in any language, because 
> RMS is the layer between the application and the file. This is a 
> structured approach, instead of producing a diarrhea of bytes, and 
> calling it a file.

Most of us use file magic to see what sort of app file we're looking 
at. And those same file magic tools common on Unix are also pretty good 
at identifying common OpenVMS-format files, too.

RMS is a database that emulates punched cards and provides the related 
sorting and arrays from that era, and it still works pretty well where 
punched cards and related access is a workable data store for an app.

RMS is problematic when it comes to updates, modifications, or pretty 
much anything past punched cards. And RMS knows nothing about the data 
and data format and encoding used in the record.

Nobody's suggesting removing RMS. But a database that tops out with 
key-value access and DEC MCS support is not going to be viewed as a 
product differentiator.

>>> Suppose I have a file with binary data, and one byte has the binary 
>>> (ascii) value of <lf>, then Unix will use it as a record separator, 
>>> even if it is in the middle of the actual data of that record.

As stated below, Unix doesn't work that way any more than PRINT 
SYS$SYSTEM:AUTHORIZE.EXE "works".

>> Unix has no records. If you cat the file it will line break at the 
>> <lf>. If you od -c the file it will identify the <lf> as just that.
> 
> Wonderful. However, it is clear that in many applications the notion of 
> a data record is present, and that the <lf> is used as record 
> separator, even if Unix formally doesn't have records.

I'm finding myself poking around in the databases of other apps rather 
less often. OpenVMS or elsewhere.

Interestingly—when poking around in an app data store is 
required—that's often gotten much easier on Unix, as the data stores 
are increasingly using the local analog to RMS.

That local equivalent tends to be SQLite, in the environments I'm often 
working in. And SQLite provides much better clues about "record" 
formats and fields and field relationships, too.

>>> Suppose you have a VMS file with fixed record size. That file has no 
>>> records separators what so ever, it is one long stream of data. VMS can 
>>> calculate where the records start and end in the file. Suppose it 
>>> consists out of sets of three records of 100 bytes that belong 
>>> together. Then you can change the attributes of that file to records of 
>>> 300 bytes, and in one read operation you will have all the data that 
>>> belongs together. I've actually used this in the past.
>> 
>> And that would be an application concept, not really an OS thing.
> 
> Actually not, since this can only be done because of the way RMS stores 
> data, and RMS is part of the OS.

Which is part and parcel of what some of us have been grumbling about 
for years; that OpenVMS, well, stopped enhancing its data access in the 
1980s, and given the efforts to embed the Rdb database support run-time 
ended with the sale of Rdb to Oracle.

SQLite would be one option here for future integration alongside RMS, 
and there are others.

>>> Suppose you want to print such a file, then VMS will send a <cr> and a 
>>> <lf> to the printer after each record. Simple.
>> 
>> VMS won't.  Whatever application actually prints it will.
> 
> Obviously, this is functionality of the spooler, and that is part of VMS.

OpenVMS printing really shouldn't be printing non-printable files—see 
file magic above, see PRINT SYS$SYSTEM:AUTHORIZE.EXE above, etc. 
Printing needs work. But here we are.

>>> The DEC software engineers understood very well why it is a bad idea to 
>>> mix up contents of a file with the structure of a file, and that's why 
>>> they did not use stream files as standard RMS files in applications. 
>>> They are just there for compatibly with Unix, Windows etc.
>> 
>> And Unix made all files streams of bytes and lets the applications 
>> decide what to do with them.  Not really an OS problem.

Ayup..  And I've found the Unix approach works very well. OpenVMS folks 
do the same wad-of-bytes thing here with the add-on databases, too; 
with Oracle Rdb, SQLite, and other such.

> Exchanging data between applications is rather important. Those 
> applications can be written in many languages, can come from different 
> sources. It is obvious that well structured files are paramount for 
> exchanging data between applications. That is why something like RMS is 
> in fact a very modern approach to structured software engineering, 
> instead of producing a an unstructured diarrhea of bytes, and calling 
> it a file.

Can't say I really want to document my internal data store as my 
import-export API, but you do you.  If it's SQLite, that access does 
work decently well, as SQLite databases are themselves quite portable 
including across endianness differences. But even on OpenVMS, poking 
directly into an app's RMS database is close to reverse-engineering, 
and not really an interface that most app developers want to support. 
Poking directly into SYSUAF isn't recommended, even if SYSUAF currently 
uses RMS, Providing a supported and documented data import-export being 
much more typical. That interchange format might be YAML or XML or any 
number of other interfaces, and frameworks and tooling are available 
for all of the common formats. Put differently, we've moved from 
abstracting at the RMS or SQLite or other layer to a higher-level 
abstraction or data import-export interface.

And RMS has other gaps here beyond its inability to import and export 
its common files, too. RMS never got around to providing support for 
consistent live backups, though various add-on databases do. RMS lacks 
data definitions within records, too. CDD/Repository went to Oracle, 
and that and SDL and other data definition tooling. RMS gets you the 
record yes, but is less than useful with the app data within the 
record, and the character encoding for the data, and related details. 
And RMS itself—and BACKUP has similar issues—stinks at identifying and 
repairing issues of metadata. Which was the reason for this thread.

Again, RMS was great in the last millennium. Its age is showing. 
There's absolutely no reason to remove RMS, but RMS is comparatively 
limited.  But if RMS works for you, have at. Streams-of-bytes file 
systems, and SQLite and other databases, and other tooling work well 
for others. RMS is just not something I miss, when working on other 
platforms.

-- 
Pure Personal Opinion | HoffmanLabs LLC