[Info-vax] Wide area cluster, metro area network, seeking info

Sat Jun 12 00:48:54 EDT 2021

On 6/11/2021 6:10 PM, Mark Berryman via Info-vax wrote:
> On 6/8/21 4:28 PM, Rich Jordan wrote:
>> We are looking at the possibility of putting VMS boxes in two
>> locations, with Integrity boxes running VSI VMS.  This is the very
>> beginning of the research on the possibility of clustering those two
>> servers instead of just having them networked.  Probably have to be
>> master/slave since only two nodes and no shared storage.
>>
>> After reviewing the various cluster docs, they seem to be focused on
>> older technologies like SoNET and DS3 using FDDI bridges (which would
>> allow shared storage).  The prospect has a metropolitan area network
>> but I do not have any specs on that as yet.
>>
>> Are there available docs relevant to running a distributed VMS cluster
>> over a metro area network or fast/big enough VPN tunnel?  Or is that
>> just the straight cluster over IP configuration in the docs (which
>> we've never used) that we need to concentrate on?
>>
>> Thanks
>>
>
> First, I recommend you ignore the suggestions to add a 3rd node to your
> cluster.  In your situation, it is not really a viable answer.
>
> There are configurations that will allow a member of a 2-node cluster to
> automatically continue in the event that the other node fails.  However,
> if you lose the communication channel but both nodes stay up, the
> cluster will partition and then you have to be really careful about how
> you reform the cluster.  Because of this, I tend not to recommend this
> particular solution except in very specific circumstances.
> (Circumstances where you can guarantee the correct node becomes the
> shadow master when the cluster reforms and you haven't been writing
> different data to each node).
>
> As far as I can tell from your description, the only way clustering
> would be a viable answer for you would be if you also did HBVS.  In that
> case, simply build a 2-node cluster with enough identical disks such
> that all of the data you want present at the backup site can be placed
> on a host-based volume set.  HBVS will then keep the data at both sites
> in sync.
>
> Failure modes in this scenario.
>
> 1. Loss of the communication channel.  In this case, both nodes will
> hang (for the duration of the cluster timeout parameters).  More
> specifically, each will freeze any process that attempts a write to
> disk.  As long as the communication channel comes back up before the
> cluster times out, everything will resume automatically.  If it doesn't,
> both nodes should take a CLUEXIT bugcheck.  Once the communication
> channel is back up, you then bring each node back up as appropriate.
>
> 2. Loss of one node.  In this case the other node will hang.  Manual
> intervention is required to get it going again (specifically, a couple
> of commands at the console to reset quorum).  At that point, everything
> simply resumes on that node.
>
> The main reason for doing it this way is that it becomes a human
> decision to decide what to do in the event of any failure.  In the event
> of any node or communication failure, any surviving cluster members will
> simply stop until you tell them what to do.  The main intent here is to
> simply prevent the wrong node from becoming the shadow master when (or
> if) the cluster is reformed.
>
> Since you are in contact with VSI, I have no doubt they will cover this
> type of scenario with you.  This is presented merely as an idea to
> generate questions as part of your discussions.
>
> Mark Berryman

This is all very interesting.  If this was being done with virtual
machines, it would seem that the cheap solution would be to run the
"secondary" system as minimal, to keep down license costs, since the
only work it would be doing is handling HBVS.  In the event of a
disaster at the primary site, like an earthquake leveling the building,
just configure new hardware at the secondary site and spin up a new
primary with the volumes from the secondary.  Why pay for compute
resources and the extra power and cooling at the secondary site, which
you will likely never use?
Of course, this is hardware and had likely already been purchased, so
there's no real savings to be had.