[Info-vax] Boot drive died on a shadowed system disk
Rich Jordan
jordan at ccs4vms.com
Thu May 17 12:36:07 EDT 2012
Based on the docs I think we're ok but its the first time this has
happened so if anyone knows for certain please feel free to comment.
DS10. OpenVMS V8.2, ECOs current. Two channel KZPEA SCSI controller,
two drives on each channel; drive DKA0 is the console selected boot
disk. System shadow disk DSA0 contains DKA0 and DKB0, the data unit
DSA1 has drives DKA100 and DKB100. Console AUTO_ACTION is RESTART.
DKA0 failed out of the shadowset with hard errors (DKB0 failed out of
DSA1 shortly thereafter, but was able to rejoin manually). DKA0 will
not remount into DSA0 (got the following error):
$ MOUNT/SYSTEM DSA0 /SHADOW=$1$DKA0: ALPHASYS /CONFIRM
%MOUNT-I-MOUNTED, ALPHASYS mounted on _DSA0:
%MOUNT-I-SHDWMEMFAIL, _$!$DKA0: (NODE) failed as a member of the
shadow set.
-SYSTEM-F-ABORT, abort
%MOUNT-I-ISAMBR, _$1$DKB0: (NODE) is a member of the shadow set
No errors were logged against the DKA0 device from this mount attempt
but one bus error on PKA0 was. We're not certain yet which component
or components are at fault (a support call is being placed).
I can mount DKA0 locally/writelocked and have run an analyze/disk on
it (with some cleanup indicated as needed).
I suppose I could mount it /override=shadow then dismount and try to
have it rejoin the set but I don't think its trustworthy so not going
to try.
My question is this. In the event of a reboot before service can be
performed, what will happen? My expectation based on the shadow docs
is one of two, either of which are survivable.
DKA0 is nonbootable: the system just fails at console level, and can
either have its console boot device changed to DKB0 or just manually
booted from DKB0.
DKA0 is at least nominally bootable: the system starts to boot, sees
the shadow info (so long as I don't mount it /OVERRIDE=SHADOW!), looks
to DKB0 and sees the severe mismatch and that DKA0 was not a valid
member if the set. It then fails the boot with a SHADBOOTFAIL
bugcheck and someone onsite still has to manually boot from DKB0.
I don't see a way for the system to actually come up on the outdated
DKA0: disk. Just bootfailures if it goes down. Is this correct?
More information about the Info-vax
mailing list