[Info-vax] Unpleasant Disk Shadowing Surprise

Jan-Erik Soderholm jan-erik.soderholm at telia.com
Tue Oct 11 13:12:56 EDT 2011


tadamsmar wrote 2011-10-11 18:49:
> On Oct 11, 12:13 pm, Kenneth Fairfield<ken.fairfi... at gmail.com>
> wrote:
>> You don't say what your storage configuration is.
>> Are the shadow members both internal disks in the
>> DS10, or are they on an external controller?
>>
>> Reason I ask is that several years ago, on some
>> HSJ-hosted storage (IIRC, otherwise it could have
>> been on HSGs), one disk in a shadow set started
>> logging a large number of errors, on the order of
>> several hundred per minute.  Unfortunately, the
>> controller went to heroic efforts to recover!
>>
>> As a result, the shadow set was functionally
>> inaccessible. (Well, it was a bit more complicated
>> than that as I think we first tried copying in
>> a 3rd member per our standard procedures, but
>> that only made the problem worse.)
>>
>> In the end, we had to just yank the bad disk out.
>> The controller was determined *not* to drop the
>> bad member.
>>
>> So...  What do you system error logs show for the
>> bad disk?  What was the error count on that member
>> before it was dropped?
>>
>> With the HSJs, I think Compaq determined there was
>> some setting that we could apply that would keep
>> the controller from working so hard to recover.
>> Without knowing your storage configuration, there's
>> no way to say whether something similar applies.
>>
>> However, watching disk error counts is *very*
>> important in all cases.
>>
>>      -Ken
>
> The shadow set is just the two internal disks of the DS10
> configuration.
>
> I was watching the error count.  That's how a know we had a single
> error
> this morning, I had checked for disk errors only an hour before the
> event.
>
> I am going to start analyzing the error log.
>
> You guys have convinced me that this is not normal and may be an
> indicator
> of something more than a typical disk error.
>
> Thanks for your input.

So both members of the shadowset are on the same SCSI controller and the
same SCSI bus. One disk can play havoc with the SCSI-bus, I guess.
Effectively blocking any access to the other shadow member.

For a DS10 in a critial application, I'd realy recomend some external
box(es), preferable on two separate SCSI controllers and shadowing
between the boxes. This is realy sheap on the second-hand market today.







More information about the Info-vax mailing list