[Info-vax] Unpleasant Disk Shadowing Surprise
Robert A. Brooks
rab at aitchpee.com
Tue Oct 11 13:25:15 EDT 2011
On 10/11/2011 10:53 AM, tadamsmar wrote:
> After the incident I found here was a disk error on one member of the
> shadow set.
> According to the console log, more than 30 seconds after the watchdog
> sounded, the shadow set changed state, the offending disk went
> offline, a mount verification started and completed.
>
> Immediately after the mount verification completed, VMS started
> working again.
>
> Looks like the disk system was inaccessible for about 3 minutes and
> any process that tried to use it got halted somehow.
>
> Is this to be expected?
Yes this is expected. Once a shadowset goes into mount verification,
all I/O is queued up; all members are treated as suspect until proven good.
Shadowing and mount verification are more complicated than when
a "scalar" device goes into mount verification; see SYSGEN params
SHADOW_MBR_TMO and MVTIMEOUT for details.
Note that a partially-failing disk can be a big problem, in that the
device will whipsaw in and out of mount verification. In a more bizarre
case, we've seen cases where I/O reads work, but writes fail.
This is a big problem, because mount verification only reads the disk,
so the device will quickly exit mount verification, only to reenter upon
the retry of the failing read.
Most I/O errors will trigger mount verification; a few, such as
SS$_DRVERR will not trigger verification, and will be immediately
returned to the caller. This is not happening in your case, however,
and is relatively rare.
-- Rob
More information about the Info-vax
mailing list