[Info-vax] Boot drive died on a shadowed system disk

Rich Jordan jordan at ccs4vms.com
Thu May 17 16:20:04 EDT 2012


On May 17, 11:36 am, Rich Jordan <jor... at ccs4vms.com> wrote:
> Based on the docs I think we're ok but its the first time this has
> happened so if anyone knows for certain please feel free to comment.
>
> DS10. OpenVMS V8.2, ECOs current.  Two channel KZPEA SCSI controller,
> two drives on each channel; drive DKA0 is the console selected boot
> disk.  System shadow disk DSA0 contains DKA0 and DKB0, the data unit
> DSA1 has drives DKA100 and DKB100.  Console AUTO_ACTION is RESTART.
>
> DKA0 failed out of the shadowset with hard errors (DKB0 failed out of
> DSA1 shortly thereafter, but was able to rejoin manually).  DKA0 will
> not remount into DSA0 (got the following error):
>
> $ MOUNT/SYSTEM  DSA0 /SHADOW=$1$DKA0: ALPHASYS /CONFIRM
> %MOUNT-I-MOUNTED, ALPHASYS mounted on _DSA0:
> %MOUNT-I-SHDWMEMFAIL, _$!$DKA0: (NODE) failed as a member of the
> shadow set.
> -SYSTEM-F-ABORT, abort
> %MOUNT-I-ISAMBR, _$1$DKB0: (NODE) is a member of the shadow set
>
> No errors were logged against the DKA0 device from this mount attempt
> but one bus error on PKA0 was.  We're not certain yet which component
> or components are at fault (a support call is being placed).
>
> I can mount DKA0 locally/writelocked and have run an analyze/disk on
> it (with some cleanup indicated as needed).
>
> I suppose I could mount it /override=shadow then dismount and try to
> have it rejoin the set but I don't think its trustworthy so not going
> to try.
>
> My question is this.  In the event of a reboot before service can be
> performed, what will happen?  My expectation based on the shadow docs
> is one of two, either of which are survivable.
>
> DKA0 is nonbootable:  the system just fails at console level, and can
> either have its console boot device changed to DKB0 or just manually
> booted from DKB0.
>
> DKA0 is at least nominally bootable:  the system starts to boot, sees
> the shadow info (so long as I don't mount it /OVERRIDE=SHADOW!), looks
> to DKB0 and sees the severe mismatch and that DKA0 was not a valid
> member if the set.  It then fails the boot with a SHADBOOTFAIL
> bugcheck and someone onsite still has to manually boot from DKB0.
>
> I don't see a way for the system to actually come up on the outdated
> DKA0: disk.  Just bootfailures if it goes down.  Is this correct?

Finally got the log to a WSEA equipped box.  Perhaps the KZPEA has
failed since the log seems to call that out.  I've not seen it before;
could this still be the result of a failing disk or perhaps an
overheated disk (if the cage fan has failed)?

If more of the log output is needed I'll be happy to post it; this was
just a snapshot showing the failure callout.

Thanks for any insights.

====================

emb_Device_Number       0
emb_func                0
emb_name_len            10
emb_name               FPO001$PKA
emb_dtname_len          0
emb_dtname

KZPEA_2
KZPEA_LW_CNT            90
pka_erl_b_rev          x0032                   packet revision 2
pka_sub_packet_class   x1389                   PCI-SCSI SubPacket
pka_sub_packet_type    x0002                   OVMS SubPacket
KZPEA_ErrCode          x0402                   Adapter Hardware
Failure
   SubType[7:0]        x2                      Runtime Error
   Type[15:8]          x4
pka_pci_bus             0
pka_pci_slot            15
pka_vendor_id          x9005
pka_device_id          x00C0                   KZPEA Ultra 3 Dual Port
pka_subsystem_vendor_idx9005
pka_subsystem_id       xF620




More information about the Info-vax mailing list