[Info-vax] Cluster hang on node reboot
Martin Vorlaender
martinvorlaender at gmail.com
Thu Jun 16 07:28:23 EDT 2016
Hi all!
I have experienced an issue with a customer's VMS cluster I have no explanation for.
The cluster consists of 2 rx2800 i2 + 1 DS25 for the quorum. The rx's HBVS disks are
provided by 2 3Pars. The rx's are running VMS V8.4 + UPDATE V11.0 + FIBRE_SCSI V9.0 .
When one of the rx2800 reboots and re-joins the cluster, there is a 2 minute hang
of the entire cluster. I ran a TCP/IP ping from the remaining rx2800 to another
system during this time which didn't lose a packet, and another DCL session still
took commands, but a simple SHOW DEVICE D issued during that time hangs, and comes
back with its expected output afterwards. There were no OPCOM messages during that
period (the last one issued before it being the cluster transition completion).
I'd suspect that access to the (shadowed) common system disk is blocked for those
2 minutes. I had minimerge enabled and DOSD parameters set up, but without really
moving the dump files off the system disk (i.e. DUMPFILE_DEVICE set to the DGA devices
of the system disk shadow set). I have switched it off since then, but didn't have
a chance to test whether that was the reason for the hang.
Any ideas?
TIA,
Martin
More information about the Info-vax
mailing list