[Info-vax] Problems detected with analyze/disk

anwa anders_s_wallin at yahoo.se
Mon Nov 23 09:12:33 EST 2009


On 23 Nov, 14:16, JF Mezei <jfmezei.spam... at vaxination.ca> wrote:
> anwa wrote:
> > We are currently double checking the SAN configuration.
> > The files marked with MULTALLOC contained corrupted data.
> > There was an average of 5-20 corruptions on each disk.
>
> How often did you do ana/disk  ? Is it possible that those accumulated
> over the years undetected, or would they have happened all recently ?
>
> Is it possible that they happened a long time ago and whatever
> software/conditions caused that is no longer on your system ?
>
> > Out of 15 disks on two clusters half had corruptions. Most are now
> > fixed but the ones with "unprintable" filenames remain.
>
> When you boot, do you have a /NOREBUILD paramater to the MOUNT command
> in your boot up sequence ? Some site have this to allow system to return
> to service faster, but if you don't do a SET VOLUME/REBUILD later on you
>  are liable to get some funky stuff. (no sure if this would result in
> multiply allocated blocks though).
>
> You should really track down any trend/common denominator for the
> affected files. Creation dates, file onwers, and look inside contents to
> spot if you see one application's data come out often. In other do some
> forensics to find out what application was involved.
>
> Doing a backup may eliminate existing events, but it may not prevent the
> problem from reoccuring.
>
> Did ANA/DISK uncover any other problems with the disk ?
>
> Is it possible someone corrupted the bitmap.sys file, telling the system
> that some blocks were free when they were not  ? (for instance, restore
> the file from an older backup).

ANAL/DISK has not been run (to my knowledge) for a long time so I
could not really say how old the errors are. That is partly the reason
for trying to get the disks into shape and running anal/disk
regularly.

There were other faults as well. The _most_ corrupted disk had:
(DELHEADER, MULTALLOC, BAD_DIRFIDSEQ, BADDIREN, LOSTHEADER).

The most common errors were DELHEADER and BADOWNER.

The systems have been used for development all the time and the
software tools used have been more or less stable over a long time.
There have been version upgrades, but that is about it.




More information about the Info-vax mailing list