[Info-vax] Long uptime cut short by Hurricane Sandy

AEF spamsink2001 at yahoo.com
Fri Jan 25 16:01:27 EST 2013


On Jan 25, 3:48 pm, Stephen Hoffman <seaoh... at hoffmanlabs.invalid>
wrote:
> On 2013-01-25 19:31:20 +0000, AEF said:
>
> > On Jan 25, 2:47 pm, Stephen Hoffman <seaoh... at hoffmanlabs.invalid>
> > wrote:
> >> On 2013-01-25 18:34:48 +0000, Bill Gunshannon said:
>
> >>> And less than a month ago we heard right here:
>
> >>>         >> Long server uptimes are the antithesis of testing.
>
> >>> Go figure....
>
> >> And you'll hear it again.   That, and the benefits of testing your DT.
>
> > DT?
>
> Disaster Tolerance.
>
> > I don't know. People used to brag of long uptimes here.
>
> Yep.  I thought it was cool, too.  Then I thought of the implications
> of what it meant.

I always thought that with such uptimes you can't AUTOGEN the system.
This is rather obvious. But people posted records anyway and everyone
bragged about how stable VMS was. So I thought I'd post mine.

>
> > Usage of these systems has been stable to the nth degree. No hardware
> > changes. No software changes. Almost zero use and not in the least bit
> > critical.
>
> Uh-huh.
> Have you not noticed the flurry of reboot problems with some subset of
> Alpha systems?

No.

> If you don't test it, you can't be sure it'll work.

Test it how? Reboot it every day? Backup and restore every day?

Trust me. If you'd come here and I showed you everything you'd
understand.

>
> > I've been the only user on the first box during this time period. I
> > use it for occasional tasks that could be done elsewhere if really
> > needed. OK, some monitoring for a short while, which was also stable.
>
> > No one's used the second box at all for almost as long.
>
> > Don't judge without all the facts.
>
> So.  Will your box reboot automatically?  Are you feeling lucky?  Do
> you have all the facts?

No, and I don't need it to. And I knew that because the battery's
dead: upon booting it wants the time. And if the box dies altogether
it's not a big deal. I could even reconstruct everything on another
box from backups (we got several dozen MicroVAX 3180's and a few
3195's). (Yes, I've tested SAB.) And even if not, I can do what I do
on it elsewhere (it would take more work, of course). No one else uses
the box. I didn't even have it up for a couple of weeks after we
finally got back into our building because I didn't need it.

One day the A/C in our "data center" (it's a large computer room) died
and the temp went up to 100 deg. F or so. My colleague to my right
told me about it and thought some of my systems would be damaged. I
thought about it for a few seconds and told him they'd be fine. Turned
out 1 disk out of several dozen died.

Yes.

Yes.

> Uptime looks great.  On paper.  Then you slam into reality.  Mistakes
> happen.  Latent bugs become less-than-latent.  That's why we test.

At my previous job I AUTOGENed every weekend. We went through
extensive DR testing. We prepared big time for Y2K. This was in the
World Trade Center. After 9/11 all my backup and DR testing efforts
proved 100% successful. Recovery was flawless. OK?

At one job I interviewed at in the late 90's they rebooted the VAX
every day!

>
> --
> Pure Personal Opinion | HoffmanLabs LLC

OK.

AEF



More information about the Info-vax mailing list