[Info-vax] Unpleasant Disk Shadowing Surprise
    Robert A. Brooks 
    rab at aitchpee.com
       
    Tue Oct 11 13:25:15 EDT 2011
    
    
  
On 10/11/2011 10:53 AM, tadamsmar wrote:
> After the incident I found here was a disk error on one member of the
> shadow set.
> According to the console log, more than 30 seconds after the watchdog
> sounded, the shadow set changed state, the offending disk went
> offline, a mount verification started and completed.
>
> Immediately after the mount verification completed, VMS started
> working again.
>
> Looks like the disk system was inaccessible for about 3 minutes and
> any process that tried to use it got halted somehow.
>
> Is this to be expected?
Yes this is expected.  Once a shadowset goes into mount verification, 
all I/O is queued up; all members are treated as suspect until proven good.
Shadowing and mount verification are more complicated than when
a "scalar" device goes into mount verification; see SYSGEN params 
SHADOW_MBR_TMO and MVTIMEOUT for details.
Note that a partially-failing disk can be a big problem, in that the 
device will whipsaw in and out of mount verification.  In a more bizarre 
case, we've seen cases where I/O reads work, but writes fail.
This is a big problem, because mount verification only reads the disk, 
so the device will quickly exit mount verification, only to reenter upon
the retry of the failing read.
Most I/O errors will trigger mount verification; a few, such as 
SS$_DRVERR will not trigger verification, and will be immediately 
returned to the caller.  This is not happening in your case, however, 
and is relatively rare.
				-- Rob
    
    
More information about the Info-vax
mailing list