[Info-vax] Cluster hang on node reboot

Stephen Hoffman seaohveh at hoffmanlabs.invalid
Thu Jun 16 09:36:39 EDT 2016


On 2016-06-16 11:28:23 +0000, Martin Vorlaender said:

> I have experienced an issue with a customer's VMS cluster I have no 
> explanation for.
> ...
> Any ideas?

Check the consoles on the other systems around the cluster 
reconfiguration looking for votes and related, and the shutdown output 
and the startup output or logging, looking for clues.  Look 
specifically for errors, disk dismounts that are not clean (with the 
ensuing rebuilds), and for the timing of the cluster transition 
messages.

Check that the cluster parameters are consistent across all hosts, the 
cluster timers are set appropriately, that the VAXCLUSTER parameter is 
set to 2 everywhere, and that EXPECTED_VOTES is (presumably) 3 
everywhere.

Also check the error logs, as I've seen some Itanium systems logging 
blizzards of errors there, but not via SHOW ERROR.  Probably not the 
case here, but worth a look.

I'll assume that the AlphaServer DS20 shares absolutely nothing with 
the cluster, save for the cluster group identifiers and password.

The OpenVMS clustering management user and configuration interface is 
unfortunately rather poor, and the logging is all over the place, so 
this troubleshooting tends to be a slog.




-- 
Pure Personal Opinion | HoffmanLabs LLC 




More information about the Info-vax mailing list