[Info-vax] Wide area cluster, metro area network, seeking info

Tue Jun 15 02:51:00 EDT 2021

On 6/14/2021 10:39 PM, Mark Berryman wrote:
> On 6/14/21 1:56 PM, Michael Moroney wrote:
>> On 6/14/2021 12:44 AM, Phillip Helbig (undress to reply) wrote:
>>> In article <sa6ge0$sra$1 at dont-email.me>, Mark Berryman
>>> <mark at theberrymans.com> writes:
>>>
>>>> On 6/12/21 1:01 AM, Phillip Helbig (undress to reply) wrote:
>>>>> In article <sa11hi$cpc$1 at dont-email.me>, Mark Berryman
>>>>> <mark at theberrymans.com> writes:
>>>>>
>>>>>> First, I recommend you ignore the suggestions to add a 3rd node to 
>>>>>> your
>>>>>> cluster.  In your situation, it is not really a viable answer.
>>>>>
>>>>> It would solve most of the problems you mention below, and also could
>>>>> serve as a test node.
>>>>>
>>>>>> There are configurations that will allow a member of a 2-node 
>>>>>> cluster to
>>>>>> automatically continue in the event that the other node fails.
>>>>>
>>>>> How?  If one has more votes, it is essential.  If the votes are the
>>>>> same, both are essential.  Unless you mean a quorum disk.  But it 
>>>>> should
>>>>> be at a third location.
>>>>>
>>>>
>>>> Situation: two separate nodes with no shared storage
>>>>
>>>> Configure each node with one vote.  Configure each node to use a local
>>>> disk as the quorum disk, also with one vote.
>>>>
>>>> As the cluster is formed, the nodes will discover that they do not 
>>>> agree
>>>> on the quorum disk and will exclude it, resulting in quorum being
>>>> established with the 2 votes provided by the nodes.
>>>>
>>>> One node goes down, the other pauses while it recomputes quorum.  In
>>>> doing so it discovers there is no longer a conflict regarding the 
>>>> quorum
>>>> disk so it includes it, resulting in two votes which re-achieves quorum
>>>> and the node continues.
>>>>
>>>> When the failed node comes back up, the quorum disks will be excluded
>>>> again and the cluster will return to its original state.  The danger of
>>>> this configuration is that, if the communication channel between the 
>>>> two
>>>> nodes is lost but the nodes remain up, the cluster will partition.  
>>>> This
>>>> is addressed in my original posting.
>>>
>>> Is this scenario supported?
>>>
>> No!
> 
> According to whom?
> 
> It was certainly supported by Digital when they gave me the set up back 
> in the 80s.

I worked for the VMS Cluster IO group for Digipaqard (and really still 
do the equivalent at VSI but all my recent work is unrelated x86 stuff) 
and there was no way our group would EVER approve that.  The cluster 
"quorum hang" exists for a reason, to explicitly avoid the "split brain" 
situation if the channel goes down.

My guess is that you had a salesperson who knew of this "trick" to "get 
around" the quorum hang "problem".  I agree with Hoff that the 'two 
quorum disk definitions' behavior is a bug and I'll try to reproduce it 
and enter a problem report if it still exists.

The risks of a split brain cluster is VERY implementation dependent. 
Anything from harmless to scrambling your data to destroying a chunk of 
your chemical plant or something as the two clusters try to do two 
incompatible things with valves or something.