[Info-vax] notification upon reboot

Bob Gezelter gezelter at rlgsc.com
Wed Feb 27 09:31:41 EST 2013


On Tuesday, February 26, 2013 1:42:59 PM UTC-5, pcov... at gmail.com wrote:
> hi, 
> 
> 
> 
> I am looking for some ideas as to being notified of a system restarting, or it even crashing... I am on itaniums running 8.3 1h1. one suggestion is an email during the startup.  could work, unfortunately knowing the system was down for 5 hours would have been nice too. :-(
> 
> 
> 
> thanks
> 
> Paul

Paul,

As has been noted, node uptime is not necessarily what one wants to monitor. Membership in the cluster is interesting, but a disconnect from the outside world can leave the cluster up, but still effectively down.

Within a cluster, there are two ways to detect node down directly:

- receive a copy of the OPCOM message reflecting node down
- use the Lock Manager to detect a node leaving the cluster (having a file that is monopolizing a special file is implicitly relying on the Lock Manager)

For operational purposes, the outside reachability of the node is the relevant parameter. A simple script on a cheap virtual host at a cloud facility (e.g., Amazon or GoDaddy) doing a WGET to each node can see if the entire structure is functioning (e.g., cluster up but Apache/WASD down is still system down for most purposes).

There are also a variety of Uptime monitoring services publicly available which accomplish almost precisely this.

- Bob Gezelter, http://www.rlgsc.com



More information about the Info-vax mailing list