[Info-vax] wrong file format

Fri Jan 1 09:09:36 EST 2021

Stephen Hoffman wrote:
> On 2020-12-31 11:07:20 +0000, Dirk Munk said:
> 
>> Bill Gunshannon wrote:
>>> On 12/30/20 7:59 AM, Dirk Munk wrote:
>>>> Jan-Erik Söderholm wrote:
>>>>> Den 2020-12-29 kl. 23:05, skrev Dirk Munk:
>>>>>> No, of course Unix is not immune. Using <lf> or <cr> (Windows) as 
>>>>>> record terminator is a rather silly idea. It means that you can't 
>>>>>> use those characters in a record, and you have to scan the 
>>>>>> contents of a file for those characters. Simply writing the length 
>>>>>> of a record at the beginning of that record is far better solution.
>>>>>
>>>>> Having a <LF> or a <CR> in text files seems rather logical to me. 
>>>>> What else, if you want either a line feed or a carriage return?
>>>>>
>>>>> But yes, there are other ways to specify and delimiting a "line of 
>>>>> text", if you have a system suporting that.
>>>>>
>>>>> Now, if that "record" is something else than a "line of text"...
>>>>
>>>> The problem is that in Unix and Windows land there is no difference 
>>>> between the metadata of a file, and the actual contents of a file. 
>>>> The metadata should define the file and the records in the file, 
>>>> that should be completely separate from the actual data contents of 
>>>> the file.
>>>
>>> Can't speak for Windows, but Unix has no meta-data. Unix has only one 
>>> file type, a stream of bytes.  Everything else is application layer.
>>
>> Which means you don't have a clue about the contents of a file, until 
>> you know the internals of the application. Standard VMS applications 
>> produce structured files, so you only have to worry about the data 
>> contents. It is possible to write your own applications using the 
>> files of another application. The application can be in any language, 
>> because RMS is the layer between the application and the file. This is 
>> a structured approach, instead of producing a diarrhea of bytes, and 
>> calling it a file.
> 
> Most of us use file magic to see what sort of app file we're looking at. 
> And those same file magic tools common on Unix are also pretty good at 
> identifying common OpenVMS-format files, too.
> 
> RMS is a database that emulates punched cards and provides the related 
> sorting and arrays from that era, and it still works pretty well where 
> punched cards and related access is a workable data store for an app.
> 

RMS offers all kind of sequential files types, relative files, and 
indexed sequential files. The latter may be called database files, but 
sequential files surely not. You may compare them with punched cards, 
and I would compare Unix files with a container of shreds that is left 
over from punching cards.

> RMS is problematic when it comes to updates, modifications, or pretty 
> much anything past punched cards. 

Never had any problems with that, on the contrary. And I wrote many 
programs that updated, modified, extended RMS files.

> And RMS knows nothing about the data 
> and data format and encoding used in the record.

True, that could be improved.

> 
> Nobody's suggesting removing RMS. But a database that tops out with 
> key-value access and DEC MCS support is not going to be viewed as a 
> product differentiator.
> 
>>>> Suppose I have a file with binary data, and one byte has the binary 
>>>> (ascii) value of <lf>, then Unix will use it as a record separator, 
>>>> even if it is in the middle of the actual data of that record.
> 
> As stated below, Unix doesn't work that way any more than PRINT 
> SYS$SYSTEM:AUTHORIZE.EXE "works".

Unfortunately, many Unix utilities work that way if they expect ASCII files.

> 
>>> Unix has no records. If you cat the file it will line break at the 
>>> <lf>. If you od -c the file it will identify the <lf> as just that.
>>
>> Wonderful. However, it is clear that in many applications the notion 
>> of a data record is present, and that the <lf> is used as record 
>> separator, even if Unix formally doesn't have records.
> 
> I'm finding myself poking around in the databases of other apps rather 
> less often. OpenVMS or elsewhere.
> 
> Interestingly—when poking around in an app data store is required—that's 
> often gotten much easier on Unix, as the data stores are increasingly 
> using the local analog to RMS.
> 
> That local equivalent tends to be SQLite, in the environments I'm often 
> working in. And SQLite provides much better clues about "record" formats 
> and fields and field relationships, too.
> 

Sure, great stuff. But that is a database ! I've often used indexed 
sequential files in DCL, simple read and write commands, no need for 
SQL. And in many cases indexed sequential files are fine (and fast) as 
database files in applications as well.

>>>> Suppose you have a VMS file with fixed record size. That file has no 
>>>> records separators what so ever, it is one long stream of data. VMS 
>>>> can calculate where the records start and end in the file. Suppose 
>>>> it consists out of sets of three records of 100 bytes that belong 
>>>> together. Then you can change the attributes of that file to records 
>>>> of 300 bytes, and in one read operation you will have all the data 
>>>> that belongs together. I've actually used this in the past.
>>>
>>> And that would be an application concept, not really an OS thing.
>>
>> Actually not, since this can only be done because of the way RMS 
>> stores data, and RMS is part of the OS.
> 
> Which is part and parcel of what some of us have been grumbling about 
> for years; that OpenVMS, well, stopped enhancing its data access in the 
> 1980s, and given the efforts to embed the Rdb database support run-time 
> ended with the sale of Rdb to Oracle.

Yes, that was a monumental stupidity. Leave it to managers to be that 
stupid. RdB should have been an integral part of VMS. Perhaps VSI can 
buy it back from Oracle?

RMS should be updated as well. More functionality, like record 
descriptions, larger record size etc, would be great. Perhaps if we ask 
Hein nicely, he can do it.

> 
> SQLite would be one option here for future integration alongside RMS, 
> and there are others.
>

I have no problem with that.

>>>> Suppose you want to print such a file, then VMS will send a <cr> and 
>>>> a <lf> to the printer after each record. Simple.
>>>
>>> VMS won't.  Whatever application actually prints it will.
>>
>> Obviously, this is functionality of the spooler, and that is part of VMS.
> 
> OpenVMS printing really shouldn't be printing non-printable files—see 
> file magic above, see PRINT SYS$SYSTEM:AUTHORIZE.EXE above, etc. 
> Printing needs work. But here we are.
> 

These are not non-printable files. The contents is plain ascii, it is 
just adding a <cr> and a <lf> when a record is send to the printer. This 
has been standard since RSX !!

>>>> The DEC software engineers understood very well why it is a bad idea 
>>>> to mix up contents of a file with the structure of a file, and 
>>>> that's why they did not use stream files as standard RMS files in 
>>>> applications. They are just there for compatibly with Unix, Windows 
>>>> etc.
>>>
>>> And Unix made all files streams of bytes and lets the applications 
>>> decide what to do with them.  Not really an OS problem.
> 
> Ayup..  And I've found the Unix approach works very well. OpenVMS folks 
> do the same wad-of-bytes thing here with the add-on databases, too; with 
> Oracle Rdb, SQLite, and other such.

Database files also offer structured data. I would not expect databases 
to be implemented on top of RMS, that would be a bit silly. Databases 
like RdB, SQLlite, Oracle, DBMS etc. can be seen as an alternative for 
RMS, with more and other functionality.

> 
>> Exchanging data between applications is rather important. Those 
>> applications can be written in many languages, can come from different 
>> sources. It is obvious that well structured files are paramount for 
>> exchanging data between applications. That is why something like RMS 
>> is in fact a very modern approach to structured software engineering, 
>> instead of producing a an unstructured diarrhea of bytes, and calling 
>> it a file.
> 
> Can't say I really want to document my internal data store as my 
> import-export API, but you do you.  If it's SQLite, that access does 
> work decently well, as SQLite databases are themselves quite portable 
> including across endianness differences. But even on OpenVMS, poking 
> directly into an app's RMS database is close to reverse-engineering, and 
> not really an interface that most app developers want to support. Poking 
> directly into SYSUAF isn't recommended, even if SYSUAF currently uses 
> RMS, Providing a supported and documented data import-export being much 
> more typical. That interchange format might be YAML or XML or any number 
> of other interfaces, and frameworks and tooling are available for all of 
> the common formats. Put differently, we've moved from abstracting at the 
> RMS or SQLite or other layer to a higher-level abstraction or data 
> import-export interface.
> 
> And RMS has other gaps here beyond its inability to import and export 
> its common files, too. RMS never got around to providing support for 
> consistent live backups, though various add-on databases do. RMS lacks 
> data definitions within records, too. CDD/Repository went to Oracle, and 
> that and SDL and other data definition tooling. RMS gets you the record 
> yes, but is less than useful with the app data within the record, and 
> the character encoding for the data, and related details. And RMS 
> itself—and BACKUP has similar issues—stinks at identifying and repairing 
> issues of metadata. Which was the reason for this thread.
> 

The reason for this thread was that record delimiters (<cr> and <lf>) 
ended up as data in records. With the solution I offered, they were 
removed from the data. My point is that standard RMS sequential files 
offer envelopes that contain the data, RMS does not have to inspect the 
data to find record delimiters like <cr> and <lf>.

> Again, RMS was great in the last millennium. Its age is showing. There's 
> absolutely no reason to remove RMS, but RMS is comparatively limited.  
> But if RMS works for you, have at. Streams-of-bytes file systems, and 
> SQLite and other databases, and other tooling work well for others. RMS 
> is just not something I miss, when working on other platforms.
> 
>

Sure, RMS is limited, it is not a database. But it offers structured 
data storage, even for simple sequential files. And I always prefer 
structured data over a diarrhea of bytes.

Can RMS be improved? Sure it can, and it should.