[Info-vax] Current VMS engineering quality, was: Re: What's VMS up to these

Fri Mar 16 05:09:16 EDT 2012

On Mar 16, 4:19 am, Michael Kraemer <M.Krae... at gsi.de> wrote:
> Paul Sture schrieb:
>
> > There were a couple of other problems here.  When twenty odd workstations
> > were trying to reboot, when the network came back they all did it more or
> > less at once.
>
> Yep, rather messy.
> Plus, when the network still had persistent/intermittent problems,
> the whole configuration remained dead in the water.
>
> > Until I changed the workstations to use Dump Off System
> > Disk (DOSD), the server system disk would go into a full shadow merge (or
> > was it shadow copy?).  This caused further problems:
>
> > a) the boot times were exceptionally long
> > b) some part of DECnet Phase V could time out and you'd have to reboot
> > the workstation later to recover from this
>
> I don't know if this would have helped in the cases I experienced.
> But I'm rather sure the responsible VMS experts would have tried it,
> if possible. Too embarrassing, a VMS cluster unavailable for hours.
>
> > On the plus side, our team never lost any data from this, but I cannot
> > speak for the DBAs.
>
>

There are loosely coupled distributed system architectures and there
are tightly coupled distributed system architectures. They have
different characteristics and difference advantages and disadvantages.

Anyone who tries to force fit the wrong one in any given set of
circumstances is naive, or an idiot, or being forced to follow the
instructions of a naive idiot. It does happen occasionally.

VMS can do loosely coupled distributed systems using various
mechanisms, or it can do (most people's definition of) tightly coupled
distributed systems using local or wide area VMSclusters, wide area
HBVS, etc. With VMS these things are relatively well architected and
can be done relatively transparently to well designed applications and
will work well within their design constraints (and often beyond).
Making effective use of these facilities needs a certain level of
skill so they operate well under all conceivable circumstances.

UNIX and desktop-derived OSes can in general do loosely coupled
(exceptions apply). Beyond that there be dragons. But if it's cheap
and the price performance looks good on paper, who cares.

Well, actually, some people do care.

For certain relatively unusual requirements (such as a dual-redundant
real time SCADA system mentioned earlier by David), it may be that
neither generic tightly coupled system behaviour nor generic loosely
coupled system behaviour is appropriate. The generic tightly coupled
system behaviour introduces too much inter-dependence between co-
operating components and can introduce unacceptable delays during
transient states (e.g. cluster transitions). The generic loosely
coupled system behaviour introduces too much risk of data and control
inconsistency between co-operating systems. So in these niche cases
you end up building your application's own coupling mechanism which
shares the (SCADA or whatever) data and control information in
realtime between applications on different nodes, using well
characterised mechanisms and provides the exact behaviours the
application needs, while leaving the OS to get on and do its own
loosely coupled thing, largely unaware that there is a closely coupled
application running on top of the OS.

That's how an engineer (rather than a High Priest in the OS Wars)
might see it anyway.

The "unusable workstations" VMScluster scenario described earlier can
arise when the PHB makes a sensible decision to minimise system
management overhead by managing his workstations as a single
interconnected tightly coupled entity (rather than dozens of
separately managed loosely coupled entities) but then rather naively
chooses not to make the clearly understood necessary investment to
reduce the number of single points of failure in the resulting closely
coupled setup. In such a setup done right, you'd typically have
redundant storage, redundant boot and disk servers, and ideally some
resilience in the network itself. The consequences of not doing so
should be obvious. Proper design of a relatively simple system like
this is not rocket science, but nor is it just a matter of throwing
loosely coupled boxes together.

Summary: Once again, one size does not fit all. One OS, even one
approach, is not universally "better". "Better" ALWAYS implies better
at some specific set of requirements, even if the reader (or PHB,
whatever) is unwilling to acknowledge this.