[Info-vax] Wide area cluster, metro area network, seeking info
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Tue Jun 15 10:44:51 EDT 2021
On 2021-06-15 06:51:00 +0000, Michael Moroney said:
> On 6/14/2021 10:39 PM, Mark Berryman wrote:
>> It was certainly supported by Digital when they gave me the set up back
>> in the 80s.
AFAIK, no. That would not have been supported.
Quoth the current Cluster Systems tome:
"2.3.10. Rules for Specifying Quorum
For the quorum disk's votes to be counted in the total cluster votes,
the following conditions must be met:
On all computers capable of becoming [quorum disk] watchers, you must
specify the same physical device name as a value for the DISK_QUORUM
system parameter. The remaining computers (which must have a blank
value for DISK_QUORUM) recognize the name specified by the first quorum
disk watcher with which they communicate."
Either blank when not watching, or the same physical device when watching.
This whole cluster area needs UI work and preferably a UI overhaul as
the current cluster UI "design" is the manual assembly of accreted
kludges, but that's fodder for another time.
> I worked for the VMS Cluster IO group for Digipaqard (and really still
> do the equivalent at VSI but all my recent work is unrelated x86 stuff)
> and there was no way our group would EVER approve that. The cluster
> "quorum hang" exists for a reason, to explicitly avoid the "split
> brain" situation if the channel goes down.
OpenVMS development did some testing of partitioning with
deliberately-wrong configurations back ~Y2K, and—when we deliberately
misconfigured the parameters and configured the shared hardware for
effect—we could trash entire system disks and entire cluster storage
configurations during boot, before getting to the Username: prompt.
System and app startup alone was enough to seriously corrupt persistent
storage, when uncoordinated.
> My guess is that you had a salesperson who knew of this "trick" to "get
> around" the quorum hang "problem". I agree with Hoff that the 'two
> quorum disk definitions' behavior is a bug and I'll try to reproduce it
> and enter a problem report if it still exists.
>
> The risks of a split brain cluster is VERY implementation dependent.
> Anything from harmless to scrambling your data to destroying a chunk of
> your chemical plant or something as the two clusters try to do two
> incompatible things with valves or something.
Best case, this particular configuration wedges (entirely) or crashes
(entirely) if (when?) that data link is re-established. Worst cast, it
doesn't, and corruptions ensue. A cluster member shouldn't even boot
and be allowed join the "club" with disparate, non-blank values for
DISK_QUORUM.
AFAIK, the official recommendation would have been more hosts and/or
more connections, and/or using the IPC handler or AMDS/AM/etc to select
a lobe. But not disparate quorum disks.
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list