[Info-vax] Current VMS engineering quality, was: Re: What's VMS up to these

John Wallace johnwallace4 at yahoo.co.uk
Tue Mar 13 19:43:43 EDT 2012


On Mar 13, 11:02 pm, JF Mezei <jfmezei.spam... at vaxination.ca> wrote:
> Michael Kraemer wrote:
> > Crashing an entire workstation cluster due to some network problem
> > can hardly be called "the right thing".
>
> Actually, it unfortunatly is.  If a network problem results in possible
> cluster partitioning, or nodes that got locked due to loss of quorum but
> whose view of the cluster became stale ( no update to locks, logical
> names and all thsoe shared structures), then the best way is to simply
> force that node to reboot to ensure its stale data is not used.
>
> Say you had application X running on the local node. It had a lock on a
> remote node as well as on the local node.
>
> When the link is broken, node X freezes due to loss of quorum. Rest of
> cluster will eventually kick that node out after the recnxinterval
> timeout. When this happens, the surviving cluster will zap all locks
> that X had.
>
> Now, node X still thinks that it has a lock on both the local and remote
> files. The rest of the cluster sees X has having no locks.
>
> If you allowed X to rejoin the cluster, you can't merge the 2 lock
> tables because meanwhile, application on node Y might have taken a lock
> and started to use that remote file (since that file became lockable
> once node X was declared lost).
>
> This is where the real VMS engineering did shine. They made unpopular
> decisions (such as forcing a crash, or the much hated RWAST and RWMBX
> states) because they did spend the time to think about all the
> implications and saw possibilities where there would be corruption and
> made damned sure that it wouldn't happen.


JF, you seem to be assuming that UNIX-like OSes and UNIX-based apps
and the purchasers and users of such things care about these things
called locks. The vast majority don't care about locks, whereas as you
rightly note, these things and their behaviour in "unusual"
circumstances are core to the design of VMS.

So, is the lack of proper mandatory locking in UNIX a bug or a
feature?

The folks who designed VMS and the folks who choose to stay with VMS
see fast distributed locking (and the way it helps avoid undetected
corruption of customer data) as a feature

The folks who chose UNIX (which in the early mass market days was
usually for workstation-style boxes with no serious shared data to
speak of) found other solutions, and these days they have to accept
the occasional inevitable chaos if/when shared resources do get muxed
ip. Mostly they just choose not to have real shared resources. Which
has advantages and disadvantages.

In a sensible world, there'd be room for both approaches. Neither is
universally better, but in any given set of circumstances, one may be
more appropriate than the other.

Put another way, you wouldn't use a Porsche for shifting a ten ton
factory floor press, or a five axle low loader to impress the
girlfriend (well, er, maybe...). And no sensible person would try to
argue that either was "better" universally.

But in IT the choice of OS becomes a matter of quasi-religious faith
rather than a matter of evaluating fitness for purpose for any given
set of tasks. Not sure why, but we are where we are. Probably
something to do with the IT industry these days being a branch of the
fashion industry rather than the engineering-based discipline it once
was.



More information about the Info-vax mailing list