[Info-vax] Eisner? Down? (10 days later)
Richard B. Gilbert
rgilbert88 at comcast.net
Sun Jan 4 23:38:45 EST 2009
G Cornelius wrote:
> DeCoy wrote:
>> Thanks, George. The problem appears to be storage-related, perhaps with the
>> RAID array on the Mylex controller, and perhaps with either controller
>> hardware or controller configuration.
>>
>> Expertise in diagnosing (and perhaps fixing) Mylex controller symptoms would
>> be initially useful.
>
> I won't be of much help - my experience is with the HSJ/HSZ/HSG controller
> series.
>
> I did leave voice mail for Steve offering my services, but you folks
> will probably do better diagnosing it remotely than me trying to get
> involved. Let me know, though, if I can do something, even if it's
> just getting him some spare parts.
>
> Coincidentally, the reason I am not using the DS20 that's in my garage
> is that the Mylex (KZPBC?) controller failed when I was trying to configure
> it and I have not yet sprung for a replacement or stuffed in a non-raid
> SCSI card.
>
> I know of others around here who have used the Mylex controller and
> have encountered some of its quirks. I seem to remember helping someone
> on the research side of things restore a backup of what was at the time
> a large (30GB) raid volume that was lost due to Mylex controller issues,
> or perhaps due to not noticing that a raid disk had failed until a
> second failure made recovery impossible.
>
It seems to me that it's a SYS$MANGLER's JOB to notice things like
failing disks. I had a batch job called "MORNING_CHECK" that ran every
day at 07:30. It compared the output of "SHOW ERROR" with the output
from yesterday. It checked log files for errors ("-E-" and -F-"), etc,
etc. If it found something that looked like a problem I was notified by
a text message to my pager. This gave me time to work on the problem
before it turned into a crisis!
A failed disk was not allowed to become a problem! I would swap it out
with a spare and call DEC/Compaq/HP to pick up the dear departed and
bring me a replacement drive.
In fact, thanks to MORNING_CHECK, I usually found disks that were
developing problems before the problems developed fully. One error was
allowed but when a disk started logging multiple errors, I swapped it
out with a spare and called for a replacement. The same guy who fetched
replacements for field service would fetch me a new one and I gave him
the dear departed!
More information about the Info-vax
mailing list