[Info-vax] Current VMS engineering quality, was: Re: What's VMS up to these

Sun Mar 18 15:27:48 EDT 2012

Fritz Wuehler <fritz at spamexpire-201203.rodent.frell.theremailer.net> wrote:
> glen herrmannsfeldt <gah at ugcs.caltech.edu> wrote:

>> > IBM can do this stuff correctly, UNIX can't. UNIX and NFS 
>> > are broken, this is just one of a million stupid UNIX non-designs.

>> What does IBM do? On which system?

> IBM shares DASD in a sysplex, across multiple physical machines, 
> with full integrity. Standard on MVS at least since ESA 
> (circa 1987) maybe earlier.

I am not so sure what they have by now, the had RESERVE/RELEASE
at last back to OS/360. For DASD with more than one channel interface,
the RESERVE CCW keeps others away, and RELEASE allows others to use
the device again. Now, what happens when one system goes down after
RESERVE and before RELEASE? Most likely the system hangs.

(snip)
>> > Two phase commit. Don't depend on clients and servers to always get 
>> > along, plan for the times they don't. Don't lose data. Don't hang 
>> > a client machine. It's all basic, obvious stuff for a serious OS. 
>> > UNIX needs work, lots and lots of work.

Well, a primary use for Sun NFS was diskless clients. There isn't 
much at all that the client can do if it can't get to the server.

But for hosts with disks, especially with cross mounting, yes, it
can be a problem.

>> I suppose you could return a fatal write error to the client program,
>> which pretty much means data loss. How many programs have a way to
>> handle a write failure by writing the data somewhere else?

> What difference does that make? Are you saying since the client 
> code is broken we have to accept data loss? Other systems don't 
> work that way. 

What do you tell the client program when the server goes away?

> The transaction gets backed out if the server doesn't respond 
> and the server backs out from its logs when it comes back up. 

And what does the client do in the mean time? 

> UNIX isn't ready for prime time and it won't ever be because the 
> people involved with it don't understand what data integrity is 
> or what software engineering is. They just put broken shit out 
> there with cutesy names that relies on ten thousand other pieces 
> of shit and pat themselves on the back.

Well, as far as I understand it, much of the idea behind unix was
to do things nice, simple, and easy to work with. That was in the tails
of Multics (note the name) which did everything in the most complicated
way. 

With a local disk, you might return a fatal write error. On most 
systems there is no return code that says temporary write failure,
try again later.

> UNIX is borked because robustness was never a part of the design 
> and people said what you just said "how How many programs have a 
> way to  handle a write failure by writing the data somewhere else?" 
> instead of saying "data integrity matters and we will always 
> leave your files, databases, etc. in a known, consistent state"

Synchronous writes are a very important part of NFS data integrity.
That is, that the client knows for sure that the data has been
written to physical disk. 

If you like, soft mounts, as well as I know, return the error to
the client and allow the client to take appropriate action. 

-- glen