[Info-vax] Clustering (was: Re: free shell accounts?)

Thu Jan 22 13:06:28 EST 2015

On 2015-01-22 06:51:30 +0000, Stan Radford said:

> On 2015-01-20, Matthew H McKenzie <news.deleteme at swellhunter.org> wrote:
>> No issues so far, but it is not a cluster, Deathrow could lose a node and
>> still be usable.

Technically, Deathrow can only lose GEIN (AlphaServer DS10L), as the 
data is presently stored on JACK (Integrity rx2600).  It's configured 
as a primary-secondary.   JACK has much more storage than does GEIN, 
and there's no shared storage interconnect.

The Deathrow hardware is actually working.   The owner of the cluster 
hasn't sorted out why the inbound IP network connections are getting 
blocked, and aren't getting to the cluster systems.  Of course the 
difference between operational though inaccessible hardware, and dead 
hardware, is pragmatically rather irrelevant.

>>  Of course recompilation was necessary across architectures
>> and not everything is 100% portable. They have a log of incidents if you
>> wish to look at recent history.
> 
> I don't understand what a cluster does.  If they don't have shared disks
> somewhere wouldn't they have to have multiple copies of everything? How does
> a cluster still remain usable if you are editing a file and the machine the
> file lives on fails? I can see for serving applications a cluster would be
> great but I don't understand how it helps development users. And even that
> would seem like it would take a lot of planning and wouldn't just automatically
> "work" because of the need for shared storage somewhere.

Others have pointed to the clustering manuals in the VMS documentation 
set.   The VMS doc set has a higher-level and introductory overview of 
clustering, and a lower-level and more detailed clustering manual.  Go 
skim at least the upper-level introductory clustering manual.

One detail that may not have not mentioned is that host-based volume 
shadowing (HBVS) works across cluster members, so even clusters that 
don't have shared interconnects can have the data transparently 
shadowed (also called mirroring and RAID-1) across up to six separate 
volumes, with those volumes potentially located on six separate servers.

Had there been a shared storage interconnect available in the 
currently-two-node Deathrow cluster, then any of the cluster members 
connected on that shared interconnect could continue to operate, and — 
so long as quorum is met — cluster members can enter and exit the 
cluster without affecting shared data.   With a two-node configuration, 
quorum either needs a primary-secondary configuration, or needs a 
shared cluster interconnect — for instance, multi-host parallel SCSI is 
allowed as a shared interconnect, when configured in certain hardware 
configurations — with what's called a quorum disk.  Probably easier to 
understand with quorum in a cluster is the presence of a third or 
additional voting members.   Two host clusters with no quorum disk are 
more fragile, as you can't automatically differentiate a disconnection 
from a host being down.   With either three or more members, or two 
members with a quorum disk on a shared interconnect, the cluster can 
transparently survive the loss of any single host.  The more members, 
the more losses the cluster can survive, before it drops below quorum 
and (intentionally) stalls to preserve the data.

Is there planning and hardware involved in clustering?  Sure.   Less 
than you might assume, unless you're planning to try this without 
having skimmed the VMS manuals.

-- 
Pure Personal Opinion | HoffmanLabs LLC