[Info-vax] rule of thumb for replacing bad disks based on error count

Hans Vlems hvlems at freenet.de
Tue Jul 12 03:05:46 EDT 2011


On 11 jul, 20:51, hel... at astro.multiCLOTHESvax.de (Phillip Helbig---
undress to reply) wrote:
> Obviously, if I see the error count increasing quickly on a physical
> disk, I will replace it.  My hope is that until I do so, HBVS will keep
> my data safe.  (For really important shadow sets, I have 3 members; for
> others, 2.)  But what about for SLOWLY increasing error counts?  And
> what about errors on the shadow set itself, rather than on the members?
>
> Obviously, physically bad sections of a disk can cause errors, but what
> are other causes of error on physical disks and on shadow sets?
>
> Is there any reason to suspect the physical disks, as opposed to
> controllers, cables etc, if the error count increases on a shadow set?
>
> Can I assume if the error count increases on only one node, then there
> is no danger of data becoming lost (presumably because the problem
> cannot be on the disks, otherwise it would be visible on all nodes)?
>
> What is a good rule of thumb for replacing disks (in shadow sets) based
> on error count?

It depends on how the disk is connected to the VMS system. For non-
shadowed disks
that are connected to a SCSI controller of the system itself a single
error is sufficient for
me to swap the disk.

Disks mounted clusterwide:
Apart from one DSSI cluster, all other clusters are NI based clusters
and the switches
in my home LAN are not manufactured by DEC nor DNPG, let alone Cisco.
So a couple of
errors (max 5) in 24 hrs is just about as high as I'd like to see that
counter.
The errorlog is useful, if PEDRIVER is involved then the LAN is at
fault, I more or less expect
that to happen. READ or WRITE errors on the device are an indication
to swap the disk.

Shadow sets: I have no shadow sets with volumes spread across multiple
clustermembers.
The same applies: a single error and the volume is replaced. The
problem is that I only have shadow sets
with identical disks and the VAXes tend to have fairly old disks (RZ26/
RZ25/RZ24) with many differences
in hardware. There are many RZ26-L models, all different.

Hans



More information about the Info-vax mailing list