[Info-vax] Boot drive died on a shadowed system disk
Rich Jordan
jordan at ccs4vms.com
Thu May 17 16:20:04 EDT 2012
On May 17, 11:36 am, Rich Jordan <jor... at ccs4vms.com> wrote:
> Based on the docs I think we're ok but its the first time this has
> happened so if anyone knows for certain please feel free to comment.
>
> DS10. OpenVMS V8.2, ECOs current. Two channel KZPEA SCSI controller,
> two drives on each channel; drive DKA0 is the console selected boot
> disk. System shadow disk DSA0 contains DKA0 and DKB0, the data unit
> DSA1 has drives DKA100 and DKB100. Console AUTO_ACTION is RESTART.
>
> DKA0 failed out of the shadowset with hard errors (DKB0 failed out of
> DSA1 shortly thereafter, but was able to rejoin manually). DKA0 will
> not remount into DSA0 (got the following error):
>
> $ MOUNT/SYSTEM DSA0 /SHADOW=$1$DKA0: ALPHASYS /CONFIRM
> %MOUNT-I-MOUNTED, ALPHASYS mounted on _DSA0:
> %MOUNT-I-SHDWMEMFAIL, _$!$DKA0: (NODE) failed as a member of the
> shadow set.
> -SYSTEM-F-ABORT, abort
> %MOUNT-I-ISAMBR, _$1$DKB0: (NODE) is a member of the shadow set
>
> No errors were logged against the DKA0 device from this mount attempt
> but one bus error on PKA0 was. We're not certain yet which component
> or components are at fault (a support call is being placed).
>
> I can mount DKA0 locally/writelocked and have run an analyze/disk on
> it (with some cleanup indicated as needed).
>
> I suppose I could mount it /override=shadow then dismount and try to
> have it rejoin the set but I don't think its trustworthy so not going
> to try.
>
> My question is this. In the event of a reboot before service can be
> performed, what will happen? My expectation based on the shadow docs
> is one of two, either of which are survivable.
>
> DKA0 is nonbootable: the system just fails at console level, and can
> either have its console boot device changed to DKB0 or just manually
> booted from DKB0.
>
> DKA0 is at least nominally bootable: the system starts to boot, sees
> the shadow info (so long as I don't mount it /OVERRIDE=SHADOW!), looks
> to DKB0 and sees the severe mismatch and that DKA0 was not a valid
> member if the set. It then fails the boot with a SHADBOOTFAIL
> bugcheck and someone onsite still has to manually boot from DKB0.
>
> I don't see a way for the system to actually come up on the outdated
> DKA0: disk. Just bootfailures if it goes down. Is this correct?
Finally got the log to a WSEA equipped box. Perhaps the KZPEA has
failed since the log seems to call that out. I've not seen it before;
could this still be the result of a failing disk or perhaps an
overheated disk (if the cage fan has failed)?
If more of the log output is needed I'll be happy to post it; this was
just a snapshot showing the failure callout.
Thanks for any insights.
====================
emb_Device_Number 0
emb_func 0
emb_name_len 10
emb_name FPO001$PKA
emb_dtname_len 0
emb_dtname
KZPEA_2
KZPEA_LW_CNT 90
pka_erl_b_rev x0032 packet revision 2
pka_sub_packet_class x1389 PCI-SCSI SubPacket
pka_sub_packet_type x0002 OVMS SubPacket
KZPEA_ErrCode x0402 Adapter Hardware
Failure
SubType[7:0] x2 Runtime Error
Type[15:8] x4
pka_pci_bus 0
pka_pci_slot 15
pka_vendor_id x9005
pka_device_id x00C0 KZPEA Ultra 3 Dual Port
pka_subsystem_vendor_idx9005
pka_subsystem_id xF620
More information about the Info-vax
mailing list