[Info-vax] Unpleasant Disk Shadowing Surprise

tadamsmar tadamsmar at yahoo.com
Wed Oct 12 12:58:03 EDT 2011


On Oct 12, 10:04 am, koeh... at eisner.nospam.encompasserve.org (Bob
Koehler) wrote:
> In article <c8f1c0db-56f1-482d-8e60-e7ab42aa9... at z8g2000yqb.googlegroups.com>, tadamsmar <tadams... at yahoo.com> writes:
>
>
>
> > According to the console log, more than 30 seconds after the watchdog
> > sounded, the shadow set changed state, the offending disk went
> > offline, a mount verification started and completed.
>
> > Immediately after the mount verification completed, VMS started
> > working again.
>
> [...]
>
> > Is this to be expected?  We have had VMS and disk shadowing running
> > the application for 20 years or so, but I don't know that we have ever
> > had a disk error while the watchdog was configured to sound, so we
> > might not have noticed the halting and recovery.
>
>    While 3 minutes will kill almost any real-time application (depending
>    on the definition of "real-time"), every other system I've tried
>    would have crashed during such an event (many aren't supposed to).
>
>    Even VMS can't reach out and patch up broken hardware, but the only
>    reason you'ld see this behaviour would be if you were actually
>    accessing the disk with the problem.  Most of the time for such
>    issues I never actually saw a problem until my overnight backup
>    accessed an otherwise rarely visited file.  (And my real-time
>    application, which couldn't tolerate 10ms, wasn't running.)
>
>    If your definition of real-time can't handle 3 minutes of
>    interruption, then you probably need to engineer a different solution
>    than the kind of shadowing approach you're using now.
>
>    But 3 minutes over 20 years is better than 99.9999%.  I can remember
>    when "5 nines" was all the rage, and you got 6.

I was thinking along those lines. But now that someone has laid out
the argument, me being the argumentative sort, my mind jumped to
a possible refutation as follows:

Our hardware purchases don't come out of our profit.  But, lack of
availablity of the hardware does come out of our profit.  And, I
am responsible for the computers, so I would be the goat if our
profit got circumcised. Or perhaps I should say I'd be the mohel!

QED...

I do have to get the hardware purchase approved, but some of the
mitigating solutions are cheap enough I think.

I am really thankful for everyone's help here on this newsgroup!



More information about the Info-vax mailing list