[Info-vax] troubles with quorum disk

Thu Jan 10 20:02:52 EST 2013

QDSKVOTES=1 would have been better, yes.  Any preferably odd value will 
work, though.

On 2013-01-10 23:40:18 +0000, Michael Moroney said:

> Keith Parris <keithparris_deletethis at yahoo.com> writes:
> 
>> QDSKVOTES=3 would indeed allow the survival of one host in a 2-node
>> cluster, but then so would any value of QDSKVOTES greater than zero.
>> However, any value over 1 makes the quorum disk into a single point of
>> failure for the cluster.

True.

Though with dinky FC clusters, having the quorum disk around (and 
usually also configured as the system disk) on an EVA generally only 
leaves you dependent on the Fibre Channel and the EVA; on components 
you're already dependent on.

> 
>> The rule of thumb is: if you want to be able to boot a cluster with any
>> single node by itself up to as many as all the nodes, and never want to
>> worry about quorum, and you also don't want the quorum disk itself to
>> become a single point of failure, then set the quorum disk's votes to
>> one less (n-1) than the number of votes from the VMS systems. This
>> allows the cluster to survive loss of the quorum disk as long as all the
>> VMS nodes are present.

Yes.

> That is correct.  If you have a cluster with n nodes and you want it to
> be able to run with any combination of the nodes (from 1 to all), set
> EXPECTED_VOTES to 2n-1, each node to 1 and the quorum disk to n-1.
> For the trivial case of a 2 node cluster, it becomes 1 vote each.
> 
> The quorum disk isn't a single point of failure, but it almost is.

True.

Though the quorumm disk is far less of an issue with modern storage 
controllers and controller-based RAID, and an EVA can provide RAID.    
Times move on, and very few production sites aren't already running 
RAID-capable gear.

As for uptime, clustering has various single-points of failure within 
itself. I've had various VMS hosts jam themselves due to various 
weirdnesses.  Sometimes the distributed lock manager.  Sometimes the 
EVA freaks.  Sometimes the FC or the switch.   For VMS, clustering can 
be a good approach, and it might be the only available approach short 
of rolling your own code or (as is often the case) using a 
replication-capable database.  If you need uptime and predictable 
response time, or if you need to scale up[1], then VMS-style clustering 
may not be the best approach.

> The cluster can continue without it _only_ if all nodes are present.
> (and you can't shadow the quorum disk)

Loosely-coupled clustering (rather more akin to BASE, than to ACID), or 
a software quorum server box ("quorum toaster") would have been 
nice-to-have features for OpenVMS (a quorum VMS box works, and is 
expensive), but I digress.  RAID 6, RAID 10 or better generally works 
just fine for the quorum disk.

————
[1]Officially, the clustering host limit is "up" is 96 hosts.  In 
practice, the limiting value might be higher or lower, and it's usually 
dependent on how busy certain cluster-visible objects are.  It's very 
easy to constrain your aggregate performance behind the performance of 
your I/O or storage, for instance.

-- 
Pure Personal Opinion | HoffmanLabs LLC