[Info-vax] Assembly languages

Tue Apr 12 09:10:52 EDT 2022

Small caveat, I might have reverse character and block device (I tend to 
do that sometimes...)

Block - running through the Unix blocking/deblocking, giving you a 
stream of bytes.

Character - the basic, raw access to the device. Dealing with blocks.

I do find the names confusing, because I tend to think that character 
would be the one that gives a stream of bytes, but it's not...

On 2022-04-12 10:30, gah4 wrote:
> On Monday, April 11, 2022 at 11:42:00 PM UTC-7, Johnny Billquist wrote:
> 
> (snip, I wrote)
>>> Disk files on Unix-like systems and DOS/Windows are just a string of bytes.
>>> Any block structure is hidden by the OS disk cache.
> 
>> Well, it's not hidden by the OS disk cache in Unix. It's hidden by the
>> device driver framework. That's where the block and character devices is
>> about.
> 
> If you do a read() system call on a disk file, it either fills the buffer or
> writes the bytes left in the file. There used to be a rule that you should
> never use the block disk device.  That restriction might be gone, but I
> am not sure what happens if you do use it.  When I do write raw disk
> devices, I use the character device.

If you do a read on the raw disk, you'll be getting the blocks. You can 
only start your operation at a block boundary. Depending on the 
hardware, you might be able to not read/write a full block, though. But 
in that case, the rest will be padded by something, and not left untouched.

Almost no program would/should ever need to do this, but it works fine. 
And obviously, programs like disklabel/mkfs/newfs/fsck and the like 
needs to access the disks this way.

The block device is in a way inadvisable to use because the data will be 
sitting in OS buffers until sync. And if you just reboot, that data 
might be lost then. Remember to sync three times. :-)

So tools that directly want to manipulate the file system are not so 
great to use the block device for, since the data is not necessarily on 
disk after your write. Also, if it is a mounted file system, Unix copies 
and caches data, which aren't synced to your block access. Doubly bad if 
you try to manipulated a mounted file system, because that will be 
overwritten again by the OS syncing back it's cached data.

So it's not a real restriction to not use the block device when writing 
to a disk, but it just creates extra headaches and potential surprises. 
But if you want to treat the whole disk as a stream of bytes, you have 
to go this way.

>>> On the other hand, tapes on Unix, and I believe DOS/Windows do have a block
>>> structure. In some cases, it is necessary to preserve that in order to
>>> properly read them. There are virtual tape formats that convert a tape of
>>> blocks into a stream with block marks included, and others to reverse it.
> 
>> Actually, in Unix, tapes are the same story as disks. However, no sane
>> person ever cared to use the character device for tapes, since (as you
>> observe), with tapes there are additional reasons why you want to know,
>> and preserve block information. The character device hides this from you.
>   
> All the Unix tape work I ever did was with the character device.
> Maybe 20 years ago, we did it with tar, where you tell tar the blocking
> factor that you want, and it writes blocks of that size to tape.

So I swapped character and block then. Yes, you use the character device 
to get the raw access to the tape. And you want that. But there is also 
the block device for tapes, where you just treat a tape like a stream of 
bytes. I bet you never used it. Almost noone have, because you don't 
actually want that.
What happens there is that Unix internally blocks this up to 1024 byte 
blocks that it reads/writes to the tape for you. Just like with a 
disk... No indication about block size. No handling of tape marks. It's 
all hidden away from you.

> The Unix read() system call on a tape character device returns one tape
> block, and the length of that block.  On 9-track reel tape drives that I know,
> you can write any block length you want, with the drive writing the block,
> and an inter-block gap to the tape.  Some other tape systems only allow
> for fixed sized blocks.

Right. When you use the character device. Now try the block device. 
:evil grin:.

> More recently, I was working with Ultrium (LTO) drives, which have two
> modes.  In one mode, the drive writes fixed length (usually 512 byte)
> blocks, the other looks more like 9-track drives.  Though in the case
> of LTO all the blocking is virtual. Having the drive in the wrong mode
> confuses programs like tar.

Yeah... That would also mean you could have two layers blocking your 
data, if you aren't careful. Quite a mess.

>> For networking, it's all carried by IP datagrams in the end, which are
>> blocks. TCP then implements a stream of bytes abstraction on top of that.
> 
> Yes, but TCP won't tell you were the boundaries were before the data
> went into the TCP stream, or where the IP boundaries were.

True. And when I read a file from a disk, the system won't tell me about 
the disk block boundaries either. But they do exist. And if I use the 
block driver to a tape, I won't know about tape block boundaries either...

Also - I think Linux for example, got rid of the block/character 
distinction on devices, and don't really match Unix here. Traditionally, 
all disks and tapes had both a block and a character device driver to 
access it.

Here is a nice quote from mtio(4) on 2.11BSD:

"
      A standard tape consists of a series of 1024 byte records
      terminated by an end-of-file.  To the extent possible, the
      system makes it possible, if inefficient, to treat the tape
      like any other file.  Seeks have their usual meaning and it
      is possible to read or write a byte at a time.  Writing in
      very small units is inadvisable, however, because it uses
      most of the tape in record gaps.

      The mt files discussed above are useful when it is desired
      to access the tape in a way compatible with ordinary files.
      When foreign tapes are to be dealt with, and especially when
      long records are to be read or written, the `raw' interface
      is appropriate.  The associated files are named rmt0, ...,
      rmt23, but the same minor-device considerations as for the
      regular files still apply.  A number of other ioctl opera-
      tions are available on raw magnetic tape.
"

   Johnny