[Info-vax] notification upon reboot
Bob Gezelter
gezelter at rlgsc.com
Wed Feb 27 09:31:41 EST 2013
On Tuesday, February 26, 2013 1:42:59 PM UTC-5, pcov... at gmail.com wrote:
> hi,
>
>
>
> I am looking for some ideas as to being notified of a system restarting, or it even crashing... I am on itaniums running 8.3 1h1. one suggestion is an email during the startup. could work, unfortunately knowing the system was down for 5 hours would have been nice too. :-(
>
>
>
> thanks
>
> Paul
Paul,
As has been noted, node uptime is not necessarily what one wants to monitor. Membership in the cluster is interesting, but a disconnect from the outside world can leave the cluster up, but still effectively down.
Within a cluster, there are two ways to detect node down directly:
- receive a copy of the OPCOM message reflecting node down
- use the Lock Manager to detect a node leaving the cluster (having a file that is monopolizing a special file is implicitly relying on the Lock Manager)
For operational purposes, the outside reachability of the node is the relevant parameter. A simple script on a cheap virtual host at a cloud facility (e.g., Amazon or GoDaddy) doing a WGET to each node can see if the entire structure is functioning (e.g., cluster up but Apache/WASD down is still system down for most purposes).
There are also a variety of Uptime monitoring services publicly available which accomplish almost precisely this.
- Bob Gezelter, http://www.rlgsc.com
More information about the Info-vax
mailing list