[Info-vax] Unpleasant Disk Shadowing Surprise
Phillip Helbig---undress to reply
helbig at astro.multiCLOTHESvax.de
Wed Oct 12 14:44:06 EDT 2011
In article <4e95c170$0$28524$c3e8da3$9b4ff22a at news.astraweb.com>, JF
Mezei <jfmezei.spamnot at vaxination.ca> writes:
> Bob Koehler wrote:
>
> > If your definition of real-time can't handle 3 minutes of
> > interruption, then you probably need to engineer a different solution
> > than the kind of shadowing approach you're using now.
>
> I seem to recall being told that VMS would seamlessly continue to run
> after the loss of a disk.
For some definitions of "seamlessly". I once had a system disk crash,
back when I didn't have it shadowed (how I could sleep at night then I
don't know). VMS continued fine---no problems until it had to access
the disk. Last night I had some sort of crash (haven't analysed it yet;
everything is back now). One machine (from three---all with one
vote---in the cluster) was accessible; SHOW SYSTEM/CLUSTER showed the
usual processes etc. However, the system disk (shadow set) of this node
was in mount verification. I couldn't mount it, because in order to do
so it would have to access the system disk. I couldn't find any way to
reboot it other than powering it down. There is one bizarre side
effect:
SYSMAN> do write sys$output f$getsyi("boottime")
%SYSMAN-I-OUTPUT, command execution on node MINNIM
11-OCT-2011 22:19:51.00
%SYSMAN-I-OUTPUT, command execution on node JANDER
11-OCT-2011 22:25:38.00
%SYSMAN-I-OUTPUT, command execution on node LEEBIG
1-JAN-2015 00:06:43.00
SYSMAN> conf sh time
System time on node MINNIM: 12-OCT-2011 20:39:17.32
System time on node JANDER: 12-OCT-2011 20:39:17.55
System time on node LEEBIG: 12-OCT-2011 20:39:17.74
SYSMAN> conf set time
SYSMAN> conf sh time
System time on node MINNIM: 12-OCT-2011 20:39:25.88
System time on node JANDER: 12-OCT-2011 20:39:25.90
System time on node LEEBIG: 12-OCT-2011 20:39:25.91
SYSMAN> do write sys$output f$getsyi("boottime")
%SYSMAN-I-OUTPUT, command execution on node MINNIM
11-OCT-2011 22:19:51.00
%SYSMAN-I-OUTPUT, command execution on node JANDER
11-OCT-2011 22:25:38.00
%SYSMAN-I-OUTPUT, command execution on node LEEBIG
1-JAN-2015 00:06:43.00
Both the date and the time are way off for LEEBIG. How could this
possibly happen?
> For proper fault tolerance, you would want to have 2 SCSI controlere.
And have the members of the shadow set connected to different nodes in
the cluster.
More information about the Info-vax
mailing list