[Info-vax] delay in startup
Phillip Helbig---undress to reply
helbig at astro.multiCLOTHESvax.de
Sun Mar 25 07:39:49 EDT 2012
In article
<27617031.3022.1332623951163.JavaMail.geo-discussion-forums at vbtv42>,
FrankS <sapienza at noesys.com> writes:
> > 13-MAR-2012 17:35:38.00 MINNIM 13-MAR-2012 17:38:31.84
> > 13-MAR-2012 14:46:40.00 JANDER 13-MAR-2012 17:38:59.19
> >
> > The three minutes for MINNIM between booting and writing to the file is
> > about what I expect. JANDER waited for MINNIM to boot (quorum is 2), so
> > that explains the almost three-hour wait between booting and writing for
> > JANDER, but why did MINNIM boot almost three hours after JANDER? All
> > the nodes lost power at the same time.
>
> What do I win if I guess correctly?
Eternal fame in comp.os.vms.
> Power failed around 14:46.
>
> Jander crashed when quorum was lost, but its battery backup survived
> the entire (approximately) three hour outage. Therefore, it rebooted
> immediately and then went into a wait until quorum could be regained.
>
> Minnim had no battery backup and crashed hard (and cold).
>
> Power was restored around 17:35.
>
> Minnim reboots. Quorum is restored. Due to variability in getting
> through the startup procedure Minnim writes the "I'm alive" message
> before Jander does. I'm speculating that a third node in the cluster
> may have been responsible for part of the delay.
Interesting theory, but not correct in my case. Power went off about
12:45 and came back about 14:45. Thus, the reboot time for JANDER
reflects power coming back on. Also, I'm pretty sure the battery is
just to keep the TOY clock and possibly the console settings. (JANDER
is an XP1000; perhaps the console settings can survive without a battery
here, but not the TOY clock. The other two are PWS and definitely need
a battery for the console settings since without one they forget they
are supposed to boot VMS. The third system, a PWS, is the one with the
bad battery since it didn't come back up at all until I set up the
console for VMS again when I returned.) I don't think I've ever had a
system with batteries to enable it to survive power going off (that
would be a UPS, presumably). (I think there were also some systems with
batteries to keep the contents of RAM without power, hence the
distinction between RESTART and REBOOT, but I don't think I've ever had
such a system either.)
So, it looks like that when power came back, JANDER rebooted immediately
but MINNIM did not. I'm not sure what time, exactly, is BOOTTIME in the
startup sequence, but it is pretty early. I don't see what could have
changed after almost 3 hours to cause MINNIM to boot. One difference,
of course, is that JANDER perhaps re-formed the cluster whereas MINNIM
re-joined the cluster. However, any differences in behaviour here (such
as waiting for served disks on other nodes to become available so that
they can be mounted in a shadow set) should lead to much smaller delays.
More information about the Info-vax
mailing list