[Info-vax] Re; Spiralog, RMS Journaling (was Re: FREESPADRIFT)

Mon Jun 27 08:40:55 EDT 2016

On 2016-06-24 20:27, VAXman- at SendSpamHere.ORG wrote:
> In article <nkjrcr$4ra$1 at dont-email.me>, Stephen Hoffman <seaohveh at hoffmanlabs.invalid> writes:
>> On 2016-06-24 16:54:35 +0000,   VAXman-  @SendSpamHere.ORG said:
>>
>>> In article <yE7BjmIxxkSS at eisner.encompasserve.org>,
>>> koehler at eisner.nospam.decuserve.org (Bob Koehler) writes:
>>>> In article <nkhf9i$7s3$1 at Iltempo.Update.UU.SE>, Johnny Billquist
>>>> <bqt at softjar.se> writes:
>>>>>
>>>>> Uh? Say what? Everything in TCP/IP is just a stream of bytes. There are
>>>>>  no blocks, nothing is sent in any multiple of blocks. (And besides,
>>>>> text files in Unix do not have CF and LF in them. They  just have LF.
>>>>> Which is why I was complaining about Unix ftp  implementations, which
>>>>> often lies about file size, and sometimes cheat  when transferring in
>>>>> text mode. These protocols were not designed by  Unix people...)
>>>>
>>>> So hwo does UNIX solve it?  By lieing about it?  Does that work anyhow?
>>>>  If so, then why can't VMS lie about it?  Or do both UNIX and VMS have
>>>> to read the file twice to get it right?
>>>
>>> Maybe you'll get an answer but I'd suggest you don't hold your breath.
>>
>> Unix returns the file size in bytes.
>>
>> As for TCP presenting a stream and not datagrams, more than a few
>> neophyte developers have been derailed by that detail.  There's no
>> one-to-one mapping of write I/O size to read I/O size with TCP.  One
>> TCP write can produce one read, or potentially as many single byte read
>> I/O requests as bytes were written.
>>
>> For file transfers, the app developer chooses how much data to toss
>> over the connection.  That might be records from a file or records
>> synthesized by the network server for a network protocol, or whatever
>> hunk of data the developer thought was appropriate.  Particularly with
>> 64-bit addressing and a flat address space, it wouldn't surprise me to
>> see a few just read and write the whole file.
>>
>> For those on systems that don't have to use socket I/O, they'll call
>> the file transfer framework or whatever the local analog; libssh
>> underneath ssh has a callable interface.  Though AFAICT, there's no
>> libssh available with the HPE ssh bits.  There is a libcurl port around
>> for OpenVMS.  OpenVMS itself never sprouted a local callable copy akin
>> to macOS and copyfile(3) â€” beyond callable convert which can usually
>> get you there, and probably callable backup, or probably the FTSV/FTSO
>> spool layered product bits for those that have access to that â€” though
>> there was some work on providing that.
>>
>> Having a simple call that gets you the user file size would be handy,
>> at least for stream files and analogous.   Getting the user data size
>> of a NoSQL, or metadata-enriched RMS file formats, or a relational
>> database file, is rather less useful, so that'd best be the size of the
>> whole wad that needs to be transferred.  Whether that's blocks or not
>> matters little.
>>
>> But then this whole block size stuff will get even more interesting
>> if/when VSI adds support for native access to the two and four kibibyte
>> sector sizes that are now available.  EFI sees those differences as do
>> a few other giblets, but most users haven't had to deal with that yet.
>
> You don't need to explain that to me.
>
> I've been trying to get Johnny to realize that there are numerous ways to
> represent records in a VMS file.

And you don't have to explain that to me. I know way more of the 
internals of these things than I should have to. While not specifically 
RMS-32, I've modified RMS-11, ODS-1 and FCS enough to last me a 
lifetime, and it's really no different than RMS-32.

And just because there are numerous ways to store a file do not mean 
that I should support only one of them.

>  In *ix files (text files) there's a <LF>
> at the end of a string of bytes.  VMS can support that and, if he were to
> use that for his files, he'd be able to get the sought after byte size.  I
> don't see why there's such an inability to comprehend it.

What I can't comprehend is your inability to understand the problem, or 
that having one more piece of metadata actually could help for a rather 
common case. We've been running this thread way longer than I ever 
though was needed.

Who cares that VMS can store files in a compatible way with Unix. That 
is not the answer. You still have various files, in various formats on 
VMS. How it looks under Unix have no bearing.

>  Variable length
> file have a 2-byte length count prefixing each record -- akin to having a
> total file byte count for his purposes.  However, that length is figured
> into the total file byte count and that's NOT appropriate for a protocol
> that's sending <byte><byte><byte><byte>...<byte><byte><LF>.  Selecting a
> file format that reflects the data will get him his byte count assuming a
> <byte><byte><byte><byte>...<byte><byte><LF> transfer protocol.  Ignoring
> that will only cause him to acquire the proper and true file size, but it
> will be an incorrect size for his protocol transfers.

You are making the assumption that the files will be created 
specifically for the need and purpose, which is a broken assumption. A 
web server is expected to serve content that already exist. And if that 
is created by a text editor (not uncommon), it will normally be in the 
standard format for a text file, which on VMS would be variable sized 
sequential records with implied CRLF.

This is the most common type of files you will be serving. Going on 
rambling about how it will be easy to figure out the length of a 
stream-LF file could be more irrelevant.

However, I'm glad you at least acknowledge that getting the plain 
content size of a sequential record file in VMS is not possible without 
actually reading through the file. Because this is the problem, and this 
is what you need to do today.

> I haven't looked at the code for c$stat() on VMS, which can return a file
> byte count, to see if there's logic inherent in that code to a return file
> size biased to a *ix stream LF file.  I'd wager it's the <end_of_file_block
> -1>*512+<end_of_file_byte> computation I've been discussing because that'd
> be just too much to handle if the file was binary.  C$stat() shouldn't be
> making assumptions about the file content.
>
> Anyway, this whole thread has gone on all too long.  Sometimes, no matter
> how bright the light, the blind still refuse to see it.

Couldn't agree more.

	Johnny