[Info-vax] wrong file format

Arne Vajhøj arne at vajhoej.dk
Thu Dec 31 16:13:21 EST 2020


On 12/31/2020 3:58 PM, Dirk Munk wrote:
> Bill Gunshannon wrote:
>> On 12/31/20 6:07 AM, Dirk Munk wrote:
>>> Bill Gunshannon wrote:
>>>> On 12/30/20 7:59 AM, Dirk Munk wrote:
>>>>> The problem is that in Unix and Windows land there is no difference 
>>>>> between the metadata of a file, and the actual contents of a file. 
>>>>> The metadata should define the file and the records in the file, 
>>>>> that should be completely separate from the actual data contents of 
>>>>> the file.
>>>>
>>>> Can't speak for Windows, but Unix has no meta-data. Unix has only one
>>>> file type, a stream of bytes.  Everything else is application layer.
>>>
>>> Which means you don't have a clue about the contents of a file, until 
>>> you know the internals of the application. 
>>
>> Well, that isn't exactly true.  Certain file types do have clues.
>> And, at least under Unix, there is an application that will do a
>> very good job of identifying what the file is.  It is even possible
>> to add your own hints if they exist and if you so desire .
> 
> Nice, but suppose you have a Cobol compiler on Unix, then it will have 
> to set up its own file system with all the files Cobol supports, like 
> indexed files. What will that application do with those files? RMS will 
> tell you the structure of the file, you don't have to guess it.

RMS will always have the information about the record format.

For index-sequential files RMS will have information about the keys, but
it will not have information about the non-key part (which can actually
be different for different records).

>>>>> Suppose I have a file with binary data, and one byte has the binary 
>>>>> (ascii) value of <lf>, then Unix will use it as a record separator, 
>>>>> even if it is in the middle of the actual data of that record.
>>>>
>>>> Unix has no records. If you cat the file it will line break at the 
>>>> <lf>.
>>>> If you od -c the file it will identify the <lf> as just that.
>>>>
>>>
>>> Wonderful. However, it is clear that in many applications the notion 
>>> of a data record is present, and that the <lf> is used as record 
>>> separator, even if Unix formally doesn't have records.
>>
>> Again, that is  more of a C'ism than a Unix'ism.  If I write an
>> application that uses ^M instead of ^J it will work just fine.
>> and, there is no reason why I couldn't have ^J as a valid, non-
>> record terminating character in the file.
> 
> Sure you can. But the standard (used for instance by FTP ASCII 
> transfers) is <lf>.

The *nix standard is definitely LF.

But most network protocols including FTP use CR LF.

FTP RFC:

<quote>
In accordance with the NVT standard, the <CRLF> sequence
should be used where necessary to denote the end of a line
of text.
</quote>

<quote>
                        If this division is necessary,
the FTP implementation should use the end-of-line sequence,
<CRLF> for ASCII, or <NL> for EBCDIC text files, as the
delimiter.
</quote>

Arne



More information about the Info-vax mailing list