[Info-vax] Current VMS engineering quality, was: Re: What's VMS up to these

Sat Mar 17 18:12:14 EDT 2012

Nomen Nescio <nobody at dizum.com> wrote:

(snip, I wrote)
>> If you don't like it, use a soft mount, otherwise that is 
>> considered correct. 

>> If you are writing to a disk, and the disk doesn't respond fast enough,
>> you don't normally expect the system to just throw away the data you
>> thought you wrote, do you? 

> I don't think excuses for bad design are going to help. The answer is to
> time out the request if the NFS server doesn't respond and then force an
> unmount, thereby freeing the client system. 

More or less, that is what soft mounts do. I don't remember the
details, as I never used them. 

> Where is the deadman switch in NFS? The only answer is rebooting 
> the client or server, you get the same data loss so what's the 
> practical benefit of your data loss analysis? 

With hard mounts, the client will wait forever for the server to
come back, and then finish the write. I did it once with diskless
clients over a weekend. 

With diskless clients, they aren't going to be doing anything without
the server, so there really isn't much reason not to wait. Especially
if the root file system is on the server! 

> Either the server has to come up or you lose data, *in 
> their shitty design*.

> There should be a two phase commit to make sure the server 
> throws away any data and the client doesn't consider it committed. 

That is a separate question. There is synchronous vs. asynchronous
write. With synchronous write the server doesn't reply until the
data is on an actual disk. (A little more complicated when the disk
has an internal write cache.) 

Asynchronous is faster, which means that the data is only in
the cache of the server, but not yet to disk. If the server goes
down between accepting the data from the client and writing to
disk, then there is still data loss. Again, if you care about
your data, use synchronous writes.

> IBM can do this stuff correctly, UNIX can't. UNIX and NFS 
> are broken, this is just one of a million stupid UNIX non-designs.

What does IBM do? On which system?

>> Why would you expect that in the case of an NFS disk?

>> As previously mentioned, the result is data loss.

> Two phase commit. Don't depend on clients and servers to always get 
> along, plan for the times they don't. Don't lose data. Don't hang 
> a client machine. It's all basic, obvious stuff for a serious OS. 
> UNIX needs work, lots and lots of work.

I suppose you could return a fatal write error to the client program,
which pretty much means data loss. How many programs have a way to
handle a write failure by writing the data somewhere else?

-- glen