[Info-vax] Boot drive died on a shadowed system disk
Rich Jordan
jordan at ccs4vms.com
Thu May 17 14:39:51 EDT 2012
On May 17, 1:18 pm, Jan-Erik Soderholm <jan-erik.soderh... at telia.com>
wrote:
> Rich Jordan wrote 2012-05-17 18:36:
>
>
>
>
>
>
>
>
>
> > Based on the docs I think we're ok but its the first time this has
> > happened so if anyone knows for certain please feel free to comment.
>
> > DS10. OpenVMS V8.2, ECOs current. Two channel KZPEA SCSI controller,
> > two drives on each channel; drive DKA0 is the console selected boot
> > disk. System shadow disk DSA0 contains DKA0 and DKB0, the data unit
> > DSA1 has drives DKA100 and DKB100. Console AUTO_ACTION is RESTART.
>
> > DKA0 failed out of the shadowset with hard errors (DKB0 failed out of
> > DSA1 shortly thereafter, but was able to rejoin manually). DKA0 will
> > not remount into DSA0 (got the following error):
>
> > $ MOUNT/SYSTEM DSA0 /SHADOW=$1$DKA0: ALPHASYS /CONFIRM
> > %MOUNT-I-MOUNTED, ALPHASYS mounted on _DSA0:
> > %MOUNT-I-SHDWMEMFAIL, _$!$DKA0: (NODE) failed as a member of the
> > shadow set.
> > -SYSTEM-F-ABORT, abort
> > %MOUNT-I-ISAMBR, _$1$DKB0: (NODE) is a member of the shadow set
>
> > No errors were logged against the DKA0 device from this mount attempt
> > but one bus error on PKA0 was. We're not certain yet which component
> > or components are at fault (a support call is being placed).
>
> > I can mount DKA0 locally/writelocked and have run an analyze/disk on
> > it (with some cleanup indicated as needed).
>
> > I suppose I could mount it /override=shadow then dismount and try to
> > have it rejoin the set but I don't think its trustworthy so not going
> > to try.
>
> > My question is this. In the event of a reboot before service can be
> > performed, what will happen? My expectation based on the shadow docs
> > is one of two, either of which are survivable.
>
> > DKA0 is nonbootable: the system just fails at console level, and can
> > either have its console boot device changed to DKB0 or just manually
> > booted from DKB0.
>
> > DKA0 is at least nominally bootable: the system starts to boot, sees
> > the shadow info (so long as I don't mount it /OVERRIDE=SHADOW!), looks
> > to DKB0 and sees the severe mismatch and that DKA0 was not a valid
> > member if the set. It then fails the boot with a SHADBOOTFAIL
> > bugcheck and someone onsite still has to manually boot from DKB0.
>
> > I don't see a way for the system to actually come up on the outdated
> > DKA0: disk. Just bootfailures if it goes down. Is this correct?
>
> Can't the DS10 have two devices as default boot device?
> The though is to have a system that always boots when one
> of the system disks is "gone".
>
> Why do you want to boot a disk that is "dead"?
> Boot from DKB0 and replace DKA0.
The system is remote and normally reboots automatically in the event
of a powerfail. Until we get hands on it, its going to do what its
going to do. If they have a power fail or crash or whatever right
now, before we can determine the problem component and replace it
(could be drive, could be KZPEA, could be front access cage...).
Since the second drive on the bus is still logging errors and causing
the data shadow set to go into mount verification every 45 minutes or
so, we dismounted its A channel disk also. But even a bad DKA0 can
cause buswide symptoms.
I'll check the console manual for multiple boot devices. Its still
going to take an onsite hands-on to update the console though.
More information about the Info-vax
mailing list