[Info-vax] Unpleasant Disk Shadowing Surprise

tadamsmar tadamsmar at yahoo.com
Tue Oct 11 15:04:06 EDT 2011


On Oct 11, 1:12 pm, Jan-Erik Soderholm <jan-erik.soderh... at telia.com>
wrote:
> tadamsmar wrote 2011-10-11 18:49:
>
>
>
>
>
> > On Oct 11, 12:13 pm, Kenneth Fairfield<ken.fairfi... at gmail.com>
> > wrote:
> >> You don't say what your storage configuration is.
> >> Are the shadow members both internal disks in the
> >> DS10, or are they on an external controller?
>
> >> Reason I ask is that several years ago, on some
> >> HSJ-hosted storage (IIRC, otherwise it could have
> >> been on HSGs), one disk in a shadow set started
> >> logging a large number of errors, on the order of
> >> several hundred per minute.  Unfortunately, the
> >> controller went to heroic efforts to recover!
>
> >> As a result, the shadow set was functionally
> >> inaccessible. (Well, it was a bit more complicated
> >> than that as I think we first tried copying in
> >> a 3rd member per our standard procedures, but
> >> that only made the problem worse.)
>
> >> In the end, we had to just yank the bad disk out.
> >> The controller was determined *not* to drop the
> >> bad member.
>
> >> So...  What do you system error logs show for the
> >> bad disk?  What was the error count on that member
> >> before it was dropped?
>
> >> With the HSJs, I think Compaq determined there was
> >> some setting that we could apply that would keep
> >> the controller from working so hard to recover.
> >> Without knowing your storage configuration, there's
> >> no way to say whether something similar applies.
>
> >> However, watching disk error counts is *very*
> >> important in all cases.
>
> >>      -Ken
>
> > The shadow set is just the two internal disks of the DS10
> > configuration.
>
> > I was watching the error count.  That's how a know we had a single
> > error
> > this morning, I had checked for disk errors only an hour before the
> > event.
>
> > I am going to start analyzing the error log.
>
> > You guys have convinced me that this is not normal and may be an
> > indicator
> > of something more than a typical disk error.
>
> > Thanks for your input.
>
> So both members of the shadowset are on the same SCSI controller and the
> same SCSI bus. One disk can play havoc with the SCSI-bus, I guess.
> Effectively blocking any access to the other shadow member.
>
> For a DS10 in a critial application, I'd realy recomend some external
> box(es), preferable on two separate SCSI controllers and shadowing
> between the boxes. This is realy sheap on the second-hand market today.- Hide quoted text -
>
> - Show quoted text -

Can you point me to some specify items or vendors?

We might just replace the single SCSI card and the disks, but I want
to explore other options.

(The SCSI card had errors too.)



More information about the Info-vax mailing list