[Info-vax] Current VMS engineering quality, was: Re: What's VMS up to these
Johnny Billquist
bqt at softjar.se
Fri Mar 16 22:58:03 EDT 2012
On 2012-03-16 17.49, Bob Eager wrote:
> On Fri, 16 Mar 2012 17:39:12 -0700, Johnny Billquist wrote:
>
>> On 2012-03-16 16.48, Bob Eager wrote:
>>> On Sat, 17 Mar 2012 00:43:26 +0100, Fritz Wuehler wrote:
>>>
>>>> Johnny Billquist<bqt at softjar.se> wrote:
>>>>
>>>>> 2. Unix distributed networks using ethernet and shared disks is not
>>>>> robust at all. You must be totally uninformed if you claim this. Have
>>>>> you ever used a machine with an NFS root? Any time the server
>>>>> stopped, rebooted, or whatever, all clients *freeze*. Not even
>>>>> rebooting, unless you press the power switch. You just sit there
>>>>> waiting for the NFS server to wake up again.
>>>>
>>>> Correct. This just happened to me (facepalm) today on a modern Linux
>>>> system 2.6.29.something kernel. I didn't think and took my NFS box
>>>> offline and when my Linux client couldn't get to the mounted share
>>>> ..........................
>>>
>>> So, that's a Linux problem.
>>
>> No. That is a general problem with all Unix systems, and is not specific
>> to a certain implentation, but an effect of the whole design of Unix and
>> NFS. There is no Unix anywhere that will behave any different.
>
> Despite the fact that they have radically different source code and
> implementation? I think not.
*Yes* Despite different source code. This is not a "bug" in the source
code. This is an effect of the semantics of the system. It is this way
by design, not accident.
> I say it again. UNIX is not an operating system. There is no one uNIX.
I agree. But that don't change the way this behaves. It's as fundamental
as the Unix philosophy that all files are a stream of bytes. Even though
you have different implementations, they all give you this concept.
>>>> Solution: reboot NFS box. Stupid, stupid, stupid. Can't the UNIX
>>>> idiots *ever* do anything correctly?
>>>
>>> UNIX is not an operating system. It's a specification, and that
>>> specification doesn't include NFS anyway. Sweeping generalisations
>>> don't help. Some systems conforming to 'UNIX' work OK in this
>>> situation, and some don't.
>>
>> No. Not a single Unix behaves "OK" in this situation. They can't. It's a
>> part of the basic design of the whole system. Or perhaps rather, a part
>> of the effects of the basic design, as I'm sure they didn't
>> intentionally design it with this effect in mind. But it an effect of
>> the design.
>
> Which part? Explain.
NFS essentially tries to give the same guarantees as a local disk based
filesystem. Local disk based filesystems don't "fail", except for
physical I/O errors that are not recoverable. NFS was designed in a way
that would allow it to continue if the server went down, and then came
up again. Thus, if you are doing an operation on an NFS filesystem and
the server is not responding, NFS will hang and retry until the server
do respond again. And this is not interruptable in any way normally. You
can give options to mount to tell it to not hang, and allow interrupts
for hanging NFS calls, but that instead means that you can silently get
data corruption, so just about anyone will tell you to not use those
options.
At the lower layers inside Unix (any Unix, I'd say), you cannot even
pass an error from something that have a file system semantics, that
will translate into EINTR at the user level. Since local disk like
devices are normally expected to always return within a very short time
with data, so they are not required to be interruptable.
(You have a similar effect with tape drives, where your user program can
hang for several minutes if the tape drive is doing some slow operation,
since tape drivers also are non-interruptable, and you are in fact down
in a protected piece of code deep in the kernel, that Unix cannot back
out of, or complete until the device have responded.)
In short, there are places in the kernel, where your user program can
get stuck, in where even a kill signal will only be queued for later
processing (effectively ignoring even a "kill -9").
Google for this if you want more details. I did a quick check, and
you'll find any number of references, information and general hits if
you search. :-)
>> Show me a single Unix system that does not work this way. I'd be very
>> interested in digging into that to see what they have done to change
>> this in that case.
>
> Work which way? You are just saying 'NFS doesn't work properly'. Which
> part of the operating system, exactly, are you blaming which bahaviour on?
I'm saying NFS is working properly. I'm also saying that if the NFS
server goes down, your NFS clients will hang until the server comes
back. And that this is true for *all* Unix-like systems.
You claimed that it wasn't, so I'm asking you to provide an example.
Just a single Unix-like system which do not hang if the NFS server goes
down is good enough.
Also, the implications if the NFS server goes down are far wider than
one first realize. Since any file operation, command, or whatever, means
a file lookup, which often starts somewhere in your NFS file structure,
thus basically doing just about anything will hang.
You can't even give any commands at the shell, since your PATH variable
most likely also have your NFS directories included, and thus hangs as
soon as it tries to look up any command you type.
File completion? Same thing. It is basically impossible to even sneeze
without hanging your NFS clients if the server is down.
>>>>> 4. Unix does normally not crash, but instead freeze. And not only if
>>>>> the network goes down, but also if the single machine serving the
>>>>> disk goes down.
>>>>
>>>> Exactly what happened.
>>>
>>> On *your* system.
>>
>> On *every* system. This story is as old as NFS itself. Long before Linux
>> even existed.
>
> So, it's a problem with NFS. Not UNIX.
??? I must admit that I have not examined the behavior of NFS on
non-Unix systems, but I doubt you'll see the same problem there. Even
more, apart from Unix, no other systems I know of can use NFS as the
equivalent of the root file system. It is very much tied into Unix here...
>>>>> Go back to playing with Windows, and stop posting to this newsgroup,
>>>>> since you obviously have little to contribute anyway. And VMS and DEC
>>>>> bashing in general is not classified as "contributing".
>>>
>>> Interesting that VMS bashing is not allowed, but UNIX bashing is (by
>>> some people's rules, anyway). Free speech?
>>
>> Last I looked, this was comp.os.vms. Feel free to bash VMS all you want
>> on comp.os.unix. :-)
>
> I don't read it. Anyway, that's a childish approach.
The point is, there are several different newsgroups so that you can
limit what you read to the things that you find interesting/relevant. If
we decide to talk about everything in just one newsgroup it both becomes
totally unusable, unwieldy, and so noisy that your eyes would bleed. So
we have different newsgroups, where people can talk about different things.
An argument like yours above is basically childish. "I want to talk
about <insert your topic> *here*. Why should I have to go to somewhere
more appropriate? I don't care what others think. As long as I get what
I want..."
You get the picture I hope... ;-)
>>> You have...lots of UNIX-spec systems out there.
>>
>> All of them behaving the same way, yes...
>>
>> I know way more Unix that I'd ever want to. I've hacked the innards of
>> the BSD4.3 Reno kernel, lots of NetBSD whacking, and lots of Linux
>> whacking. All in the kernel. And I cannot count how much stuff I've done
>> at the user level of different Unix systems.
>>
>> Oh, and I have even less clue about how much I've hacked the 2BSD kernel
>> and userland. After all, if it is a PDP-11, it can't be all bad. :-) But
>> 2BSD don't have NFS.
>
> Too early for that; NFS came later. And I was hacking UHNIX kernel back
> in the Sixth Edition days..
Nice! I have not really touched 6th ed ever...
Johnny
More information about the Info-vax
mailing list