[Info-vax] wrong file format

Fri Jan 1 11:13:15 EST 2021

On 1/1/21 10:46 AM, Dirk Munk wrote:
> Bill Gunshannon wrote:
>> On 12/31/20 3:58 PM, Dirk Munk wrote:
>>> Bill Gunshannon wrote:
>>>> On 12/31/20 6:07 AM, Dirk Munk wrote:
>>>>> Bill Gunshannon wrote:
>>>>>> On 12/30/20 7:59 AM, Dirk Munk wrote:
>>>>>>> Jan-Erik Söderholm wrote:
>>>>>>>> Den 2020-12-29 kl. 23:05, skrev Dirk Munk:
>>>>>>>>> Bill Gunshannon wrote:
>>>>>>>>>> On 12/29/20 9:21 AM, Jan-Erik Söderholm wrote:
>>>>>>>>>>> Den 2020-12-29 kl. 14:35, skrev Phillip Helbig (undress to 
>>>>>>>>>>> reply):
>>>>>>>>>>>> In article <rsfarr$smk$1 at dont-email.me>, Dirk Munk 
>>>>>>>>>>>> <munk at home.nl>
>>>>>>>>>>>> writes:
>>>>>>>>>>>>
>>>>>>>>>>>>>> I tried for about 45 minutes---all the suggestions posted 
>>>>>>>>>>>>>> here! It was
>>>>>>>>>>>>>> about 100 MB, so not all were quick to check.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In the end, I managed to transfer it again (don't ask!) 
>>>>>>>>>>>>>> and somehow,
>>>>>>>>>>>>>> magically, it was OK.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> I've dealt with problems like these before, usually caused by
>>>>>>>>>>>>> applications that were not written for VMS.
>>>>>>>>>>>>>
>>>>>>>>>>>>> You need to have a bit of a feeling for the different file 
>>>>>>>>>>>>> types of VMS
>>>>>>>>>>>>> to fix these problems, but if you have that, it's very 
>>>>>>>>>>>>> simple to solve
>>>>>>>>>>>>> these little puzzles.
>>>>>>>>>>>>
>>>>>>>>>>>> I don't know how many times I've used SET FILE/ATTR or 
>>>>>>>>>>>> CONVERT or TECO
>>>>>>>>>>>> to fix things like this.  I can usually look at the 
>>>>>>>>>>>> contents, look at
>>>>>>>>>>>> DIR/FULL, and see what needs to be done if they don't match, 
>>>>>>>>>>>> but this
>>>>>>>>>>>> was somehow different.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Nice that it was fixed! And no, I do not belive in magic... :-)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And, just so people don't think, based on earlier comments, that
>>>>>>>>>> Unix is somehow immune, I frequently have to remove "^M" 
>>>>>>>>>> characters
>>>>>>>>>> from text files on Unix. Unix's only saving grace in this 
>>>>>>>>>> regard is
>>>>>>>>>> that the solution is trivial.  :-)
>>>>>>>>>>
>>>>>>>>>> bill
>>>>>>>>>>
>>>>>>>>> No, of course Unix is not immune. Using <lf> or <cr> (Windows) 
>>>>>>>>> as record terminator is a rather silly idea. It means that you 
>>>>>>>>> can't use those characters in a record, and you have to scan 
>>>>>>>>> the contents of a file for those characters. Simply writing the 
>>>>>>>>> length of a record at the beginning of that record is far 
>>>>>>>>> better solution.
>>>>>>>>
>>>>>>>> Having a <LF> or a <CR> in text files seems rather logical to me.
>>>>>>>> What else, if you want either a line feed or a carriage return?
>>>>>>>>
>>>>>>>> But yes, there are other ways to specify and delimiting a "line 
>>>>>>>> of text",
>>>>>>>> if you have a system suporting that.
>>>>>>>>
>>>>>>>> Now, if that "record" is something else than a "line of text"...
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The problem is that in Unix and Windows land there is no 
>>>>>>> difference between the metadata of a file, and the actual 
>>>>>>> contents of a file. The metadata should define the file and the 
>>>>>>> records in the file, that should be completely separate from the 
>>>>>>> actual data contents of the file.
>>>>>>
>>>>>> Can't speak for Windows, but Unix has no meta-data. Unix has only one
>>>>>> file type, a stream of bytes.  Everything else is application layer.
>>>>>
>>>>> Which means you don't have a clue about the contents of a file, 
>>>>> until you know the internals of the application. 
>>>>
>>>> Well, that isn't exactly true.  Certain file types do have clues.
>>>> And, at least under Unix, there is an application that will do a
>>>> very good job of identifying what the file is.  It is even possible
>>>> to add your own hints if they exist and if you so desire .
>>>
>>> Nice, but suppose you have a Cobol compiler on Unix, then it will 
>>> have to set up its own file system with all the files Cobol supports, 
>>
>> Don't know what you mean by "set up its own file system".  COBOL will
>> open the file and if necessary create them for output files.  To Unix
>> the files will be streams of bytes.  To COBOL Programs they will be
>> sequential , line sequential, direct or indexed.
> 
> Yes indeed. And all those files created by those Cobol programs can only 
> be used by other Cobol programs created by that compiler.

Why on earth would you think that?  They can be used by any
program using any language you wish to program in.  As long
as you know the format and contents of the file, which you
also need to know to access them with COBOL.  If they were
written as PIC X(80) (or whatever length) you can just cat
them and see the contents.

> 
> Compare that with VMS, where I can read and write those files by any 
> other program, written in any other language, or even with DCL if the 
> type of data in the files allows it.

Nothing different between them.

> 
>>
>>>                                                                   
>>> like indexed files. What will that application do with those files? 
>>> RMS will tell you the structure of the file, you don't have to guess it.
>>
>> I use GnuCOBOL.  sequential files show up as "ASCII text: as does the
>> COBOL Source File.  Indexed report as "Berkeley DB" as that was the
>> option I chose for indexed files when I built GnuCOBOL.  Other COBOL
>> compiler (like MicroFocus) may differ. Of course, the executable shows
>> up as "ELF 64-bit LSB shared object".  If I wanted to put in the
>> effort I could probably get it to identify the source as COBOL source
>> but I  see no reason to bother.
>>
> 
> Cobol source files are always text files of course. But again, with VMS 
> all the file types are offered by RMS, and can be used by any language 
> or even DCL.

And, as I stated above the exact same is true of Unix and OS-9 and
RT-11 and Windows and probably every other OS.

> 
>>>
>>>>
>>>>>                                            Standard VMS 
>>>>> applications produce structured files, so you only have to worry 
>>>>> about the data contents. It is possible to write your own 
>>>>> applications using the files of another application. The 
>>>>> application can be in any language, because RMS is the layer 
>>>>> between the application and the file. This is a structured 
>>>>> approach, instead of producing a diarrhea of bytes, and calling it 
>>>>> a file.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Suppose I have a file with binary data, and one byte has the 
>>>>>>> binary (ascii) value of <lf>, then Unix will use it as a record 
>>>>>>> separator, even if it is in the middle of the actual data of that 
>>>>>>> record.
>>>>>>
>>>>>> Unix has no records. If you cat the file it will line break at the 
>>>>>> <lf>.
>>>>>> If you od -c the file it will identify the <lf> as just that.
>>>>>>
>>>>>
>>>>> Wonderful. However, it is clear that in many applications the 
>>>>> notion of a data record is present, and that the <lf> is used as 
>>>>> record separator, even if Unix formally doesn't have records.
>>>>
>>>> Again, that is  more of a C'ism than a Unix'ism.  If I write an
>>>> application that uses ^M instead of ^J it will work just fine.
>>>> and, there is no reason why I couldn't have ^J as a valid, non-
>>>> record terminating character in the file.
>>>
>>> Sure you can. But the standard (used for instance by FTP ASCII 
>>> transfers) is <lf>.
>>
>> OK, so what?  The default has to be something and that has been around
>> a long time.  And there are a lot of applications written to that
>> standard.  But I am not forced to use it.  How  much freedom do you
>> have?
> 
> RMS stores the DATA. How it stores the data is normally something you 
> don't care about. Standard RMS does not confuse your data with record 
> delimiters like <cr>, <lf>, or which other delimiter you want to use. 
> That is the point, not which freedom I have in the choice of delimiter.
> 
>>
>>>
>>>>
>>>>>
>>>>>>>
>>>>>>> Suppose you have a VMS file with fixed record size. That file has 
>>>>>>> no records separators what so ever, it is one long stream of 
>>>>>>> data. VMS can calculate where the records start and end in the 
>>>>>>> file. Suppose it consists out of sets of three records of 100 
>>>>>>> bytes that belong together. Then you can change the attributes of 
>>>>>>> that file to records of 300 bytes, and in one read operation you 
>>>>>>> will have all the data that belongs together. I've actually used 
>>>>>>> this in the past.
>>>>>>
>>>>>> And that would be an application concept, not really an OS thing.
>>>>>
>>>>> Actually not, since this can only be done because of the way RMS 
>>>>> stores data, and RMS is part of the OS.
>>>>
>>>> See, there is where we differ in opinion.  I see RMS as an
>>>> application that just happens to ship with VMS.  Like editors,
>>>> compilers and other pieces that ship with the OS but have are
>>>> not really part of it.  Surely VMS will run without RMS present.
>>>> Not all applications need to access files at all.
>>>
>>> No, RMS is more like middleware. How do you think that VMS could read 
>>> and write its own files if RMS is not present?
>>
>> Does a call to printf/scanf in a C program use RMS?
>> (Really, can someone answer that question?)
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Suppose you want to print such a file, then VMS will send a <cr> 
>>>>>>> and a <lf> to the printer after each record. Simple.
>>>>>>
>>>>>> VMS won't.  Whatever application actually prints it will.
>>>>>>
>>>>>
>>>>> Obviously, this is functionality of the spooler, and that is part 
>>>>> of VMS.
>>>>>
>>>>>>>
>>>>>>> The DEC software engineers understood very well why it is a bad 
>>>>>>> idea to mix up contents of a file with the structure of a file, 
>>>>>>> and that's why they did not use stream files as standard RMS 
>>>>>>> files in applications. They are just there for compatibly with 
>>>>>>> Unix, Windows etc.
>>>>>>
>>>>>> And Unix made all files streams of bytes and lets the applications
>>>>>> decide what to do with them.  Not really an OS problem.
>>>>>>
>>>>>
>>>>> Exchanging data between applications is rather important. Those 
>>>>> applications can be written in many languages, can come from 
>>>>> different sources. It is obvious that well structured files are 
>>>>> paramount for exchanging data between applications. That is why 
>>>>> something like RMS is in fact a very modern approach to structured 
>>>>> software engineering, instead of producing a an unstructured 
>>>>> diarrhea of bytes, and calling it a file.
>>>>
>>>> Some see it otherwise.  Unix tends to leave more control for the
>>>> developer and not try and handcuff them with someone else's concept
>>>> of how things should be done.
>>>>
>>>
>>> If you must, you can do that with VMS as well. However, in 99.9% of 
>>> all applications, RMS with all of its functionality will give you 
>>> execly what you need. The point is, Unix doesn't have something like 
>>> that. With VMS you have the choice, with Unix, you don't.
>>>
>>
>> Really?  then why did Phillip have the problem that started this whole
>> discussion?  VMS did not give him what he needed.  I butchered a file
>> he brought over from somewhere on the web.  Of course, Unix will do
>> that too, but Unix never told you it wouldn't.  :-)
> 
> Philip had the problem that those silly record delimiters you need in 
> stream files turned up in his data. By using some RMS file manipulations 
> they could be removed.

It probably comes as a surprise to you, but a VMS Web Server would
have sent the data with exactly the same <CR><LF> characters embedded
in the text. The problem was most likely the receiving client, on VMS,
did not handle properly converting the incoming NETASCII data into
VMS data.

bill