[Info-vax] Current VMS engineering quality, was: Re: What's VMS up to these

Johnny Billquist bqt at softjar.se
Fri Mar 16 22:58:03 EDT 2012


On 2012-03-16 17.49, Bob Eager wrote:
> On Fri, 16 Mar 2012 17:39:12 -0700, Johnny Billquist wrote:
>
>> On 2012-03-16 16.48, Bob Eager wrote:
>>> On Sat, 17 Mar 2012 00:43:26 +0100, Fritz Wuehler wrote:
>>>
>>>> Johnny Billquist<bqt at softjar.se>   wrote:
>>>>
>>>>> 2. Unix distributed networks using ethernet and shared disks is not
>>>>> robust at all. You must be totally uninformed if you claim this. Have
>>>>> you ever used a machine with an NFS root? Any time the server
>>>>> stopped, rebooted, or whatever, all clients *freeze*. Not even
>>>>> rebooting, unless you press the power switch. You just sit there
>>>>> waiting for the NFS server to wake up again.
>>>>
>>>> Correct. This just happened to me (facepalm) today on a modern Linux
>>>> system 2.6.29.something kernel. I didn't think and took my NFS box
>>>> offline and when my Linux client couldn't get to the mounted share
>>>> ..........................
>>>
>>> So, that's a Linux problem.
>>
>> No. That is a general problem with all Unix systems, and is not specific
>> to a certain implentation, but an effect of the whole design of Unix and
>> NFS. There is no Unix anywhere that will behave any different.
>
> Despite the fact that they have radically different source code and
> implementation? I think not.

*Yes* Despite different source code. This is not a "bug" in the source 
code. This is an effect of the semantics of the system. It is this way 
by design, not accident.

> I say it again. UNIX is not an operating system. There is no one uNIX.

I agree. But that don't change the way this behaves. It's as fundamental 
as the Unix philosophy that all files are a stream of bytes. Even though 
you have different implementations, they all give you this concept.

>>>> Solution: reboot NFS box. Stupid, stupid, stupid. Can't the UNIX
>>>> idiots *ever* do anything correctly?
>>>
>>> UNIX is not an operating system. It's a specification, and that
>>> specification doesn't include NFS anyway. Sweeping generalisations
>>> don't help. Some systems conforming to 'UNIX' work OK in this
>>> situation, and some don't.
>>
>> No. Not a single Unix behaves "OK" in this situation. They can't. It's a
>> part of the basic design of the whole system. Or perhaps rather, a part
>> of the effects of the basic design, as I'm sure they didn't
>> intentionally design it with this effect in mind. But it an effect of
>> the design.
>
> Which part? Explain.

NFS essentially tries to give the same guarantees as a local disk based 
filesystem. Local disk based filesystems don't "fail", except for 
physical I/O errors that are not recoverable. NFS was designed in a way 
that would allow it to continue if the server went down, and then came 
up again. Thus, if you are doing an operation on an NFS filesystem and 
the server is not responding, NFS will hang and retry until the server 
do respond again. And this is not interruptable in any way normally. You 
can give options to mount to tell it to not hang, and allow interrupts 
for hanging NFS calls, but that instead means that you can silently get 
data corruption, so just about anyone will tell you to not use those 
options.

At the lower layers inside Unix (any Unix, I'd say), you cannot even 
pass an error from something that have a file system semantics, that 
will translate into EINTR at the user level. Since local disk like 
devices are normally expected to always return within a very short time 
with data, so they are not required to be interruptable.

(You have a similar effect with tape drives, where your user program can 
hang for several minutes if the tape drive is doing some slow operation, 
since tape drivers also are non-interruptable, and you are in fact down 
in a protected piece of code deep in the kernel, that Unix cannot back 
out of, or complete until the device have responded.)

In short, there are places in the kernel, where your user program can 
get stuck, in where even a kill signal will only be queued for later 
processing (effectively ignoring even a "kill -9").

Google for this if you want more details. I did a quick check, and 
you'll find any number of references, information and general hits if 
you search. :-)

>> Show me a single Unix system that does not work this way. I'd be very
>> interested in digging into that to see what they have done to change
>> this in that case.
>
> Work which way? You are just saying 'NFS doesn't work properly'. Which
> part of the operating system, exactly, are you blaming which bahaviour on?

I'm saying NFS is working properly. I'm also saying that if the NFS 
server goes down, your NFS clients will hang until the server comes 
back. And that this is true for *all* Unix-like systems.
You claimed that it wasn't, so I'm asking you to provide an example. 
Just a single Unix-like system which do not hang if the NFS server goes 
down is good enough.

Also, the implications if the NFS server goes down are far wider than 
one first realize. Since any file operation, command, or whatever, means 
a file lookup, which often starts somewhere in your NFS file structure, 
thus basically doing just about anything will hang.
You can't even give any commands at the shell, since your PATH variable 
most likely also have your NFS directories included, and thus hangs as 
soon as it tries to look up any command you type.
File completion? Same thing. It is basically impossible to even sneeze 
without hanging your NFS clients if the server is down.

>>>>> 4. Unix does normally not crash, but instead freeze. And not only if
>>>>> the network goes down, but also if the single machine serving the
>>>>> disk goes down.
>>>>
>>>> Exactly what happened.
>>>
>>> On *your* system.
>>
>> On *every* system. This story is as old as NFS itself. Long before Linux
>> even existed.
>
> So, it's a problem with NFS. Not UNIX.

??? I must admit that I have not examined the behavior of NFS on 
non-Unix systems, but I doubt you'll see the same problem there. Even 
more, apart from Unix, no other systems I know of can use NFS as the 
equivalent of the root file system. It is very much tied into Unix here...

>>>>> Go back to playing with Windows, and stop posting to this newsgroup,
>>>>> since you obviously have little to contribute anyway. And VMS and DEC
>>>>> bashing in general is not classified as "contributing".
>>>
>>> Interesting that VMS bashing is not allowed, but UNIX bashing is (by
>>> some people's rules, anyway). Free speech?
>>
>> Last I looked, this was comp.os.vms. Feel free to bash VMS all you want
>> on comp.os.unix. :-)
>
> I don't read it. Anyway, that's a childish approach.

The point is, there are several different newsgroups so that you can 
limit what you read to the things that you find interesting/relevant. If 
we decide to talk about everything in just one newsgroup it both becomes 
totally unusable, unwieldy, and so noisy that your eyes would bleed. So 
we have different newsgroups, where people can talk about different things.
An argument like yours above is basically childish. "I want to talk 
about <insert your topic> *here*. Why should I have to go to somewhere 
more appropriate? I don't care what others think. As long as I get what 
I want..."

You get the picture I hope... ;-)

>>> You have...lots of UNIX-spec systems out there.
>>
>> All of them behaving the same way, yes...
>>
>> I know way more Unix that I'd ever want to. I've hacked the innards of
>> the BSD4.3 Reno kernel, lots of NetBSD whacking, and lots of Linux
>> whacking. All in the kernel. And I cannot count how much stuff I've done
>> at the user level of different Unix systems.
>>
>> Oh, and I have even less clue about how much I've hacked the 2BSD kernel
>> and userland. After all, if it is a PDP-11, it can't be all bad. :-) But
>> 2BSD don't have NFS.
>
> Too early for that; NFS came later. And I was hacking UHNIX kernel back
> in the Sixth Edition days..

Nice! I have not really touched 6th ed ever...

	Johnny



More information about the Info-vax mailing list