[Info-vax] Unpleasant Disk Shadowing Surprise

Robert A. Brooks rab at aitchpee.com
Tue Oct 11 13:25:15 EDT 2011


On 10/11/2011 10:53 AM, tadamsmar wrote:

> After the incident I found here was a disk error on one member of the
> shadow set.

> According to the console log, more than 30 seconds after the watchdog
> sounded, the shadow set changed state, the offending disk went
> offline, a mount verification started and completed.
>
> Immediately after the mount verification completed, VMS started
> working again.
>
> Looks like the disk system was inaccessible for about 3 minutes and
> any process that tried to use it got halted somehow.
>
> Is this to be expected?

Yes this is expected.  Once a shadowset goes into mount verification, 
all I/O is queued up; all members are treated as suspect until proven good.

Shadowing and mount verification are more complicated than when
a "scalar" device goes into mount verification; see SYSGEN params 
SHADOW_MBR_TMO and MVTIMEOUT for details.

Note that a partially-failing disk can be a big problem, in that the 
device will whipsaw in and out of mount verification.  In a more bizarre 
case, we've seen cases where I/O reads work, but writes fail.
This is a big problem, because mount verification only reads the disk, 
so the device will quickly exit mount verification, only to reenter upon
the retry of the failing read.

Most I/O errors will trigger mount verification; a few, such as 
SS$_DRVERR will not trigger verification, and will be immediately 
returned to the caller.  This is not happening in your case, however, 
and is relatively rare.

				-- Rob



More information about the Info-vax mailing list