[Info-vax] Quorum disk woes
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Sun Jun 2 11:49:31 EDT 2019
On 2019-06-02 09:55:59 +0000, Marc Van Dyck said:
> Unless I completely misunderstood, expected_votes is the number of
> votes observed when the whole cluster is up.
I'd swap the word "observed" for either "intended" or "expected" in that.
The number of votes observed is determined dynamically based on the
number of votes present.
EXPECTED_VOTES is a mechanism that provides a lower bound and an
initial default for the total number of votes, prior to the connection
manager and its connections being established.
Once the cluster connections are established, then the number of votes
present and the calculated value for quorum will be floated to the
actual value of votes present in the configuration.
There are folks that try to use EXPECTED_VOTES to fake the cluster
connection manager and the voting, and that can end badly.
Particularly for cases where connections cannot be established. It'll
look like it works. Until it doesn't. And it'll end with corruptions.
> If each member as one vote, and the quorum disk has one vote too, then
> expected_votes should be 3.
This particular configuration is constrained by the requirements. This
configuration is not going to be a robust configuration. This
configuration is going to be less than what it could be. This is all
dictated from those stated app requirements found in that other thread
of yours that's directly related to this topic:
https://groups.google.com/d/msg/comp.os.vms/tc0v7lnTLSo/GDvyor2SAQAJ
The requirements for a single and consistent primary host means a
different voting pattern, and means the quorum disk is unnecessary, and
it means somebody will get to learn the IPLC handler as a way to free
up a stuck secondary. With a primary-secondary configuration, there's
no need for a quorum disk, and no need for votes on the secondary.
> Then for such a configuration the calculated quorum is 2. When booting
> one member and with the quorum disk present, we have 2 votes so the
> quorum is gained and the cluster can be alive.
Or one vote and one expected vote, and the secondary can join and can
depart the "cluster" at any time. It can share resources. But it
can't boot and operate without the primary present—which is one of the
stated requirements—without manual intervention. That manual
intervention will switch host names and switch votes and related
baggage, which will allow the secondary to impersonate the usual
primary, and will allow the software with the hard-coded host names and
the rest to operate appropriately in the failover configuration. And
with the IPLC handler the secondary can clear the quorum hang of the
"departed" primary for a look around, or the secondary can be rebooted
with a different persona, can can poke around prior to the manual
failover.
> Regarding the visibility of the quorum disk, I think it's OK but just
> to be sure, tomorrow I will extract a console log and post it here.
The quorum disk is unnecessary here. If it's present, it needs to be
present and configured on all hosts with direct access, which would be
all hosts connected to the FC SAN if that's where the quorum disk is
located.
Your app software is incompatible with the app failover capabilities of
a cluster. Clustering does not provide a mechanism to fix this. If
you want what you want, you cannot have the cluster configuration
you're trying for here.
The OpenVMS app development doc is simply awful at describing how to
code the apps for what you want, too. And how to not code the apps.
The doc does describe all the pieces, but not how it all fits together.
And the frameworks and tools are just weak, such as those around the
DLM and common tasks such as distributed job scheduling, and app and
system fail-overs. These are not easy tasks to properly code, too.
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list