[Info-vax] rule of thumb for replacing bad disks based on error count

Mon Jul 11 14:51:59 EDT 2011

Obviously, if I see the error count increasing quickly on a physical 
disk, I will replace it.  My hope is that until I do so, HBVS will keep 
my data safe.  (For really important shadow sets, I have 3 members; for 
others, 2.)  But what about for SLOWLY increasing error counts?  And 
what about errors on the shadow set itself, rather than on the members?

Obviously, physically bad sections of a disk can cause errors, but what 
are other causes of error on physical disks and on shadow sets?

Is there any reason to suspect the physical disks, as opposed to 
controllers, cables etc, if the error count increases on a shadow set?

Can I assume if the error count increases on only one node, then there 
is no danger of data becoming lost (presumably because the problem 
cannot be on the disks, otherwise it would be visible on all nodes)?

What is a good rule of thumb for replacing disks (in shadow sets) based 
on error count?