[Info-vax] Long uptime cut short by Hurricane Sandy

AEF spamsink2001 at yahoo.com
Fri Jan 25 19:17:28 EST 2013


On Jan 25, 2:48 pm, Stephen Hoffman <seaoh... at hoffmanlabs.invalid>
wrote:
> On 2013-01-25 19:31:20 +0000, AEF said:
>
> > On Jan 25, 2:47 pm, Stephen Hoffman <seaoh... at hoffmanlabs.invalid>
> > wrote:
> >> On 2013-01-25 18:34:48 +0000, Bill Gunshannon said:
>
> >>> And less than a month ago we heard right here:
>
> >>>         >> Long server uptimes are the antithesis of testing.
>
> >>> Go figure....
>
> >> And you'll hear it again.   That, and the benefits of testing your DT.
>
> > DT?
>
> Disaster Tolerance.

Hi Hoff!

Uh, I missed this part. There is no DR for these systems. There
doesn't need to be. When Hurricane Sandy hit, the building lost power.
We didn't get stable power back until about Jan 4. Nobody missed the
VAXes except for me.

I did have to manually update our Application Owner Table once, but
that's my problem and I'm okay with it. And that's pretty much all I
use it for. I use it for this a few times a year. So there's no need
to do any Disaster Recovery setup, testing, etc. (I wrote some DCL to
convert some Excel exports to wiki format to post on our Confluence
wiki. I have to discard extra columns, massage the data and such, and
output it in wiki format.) The boss wouldn't have allowed me to do any
DR setup for this anyway. It's not worth it.

Now, I *did* do DR work for the apps I'm responsible for: JIRA and
Confluence. They run on Unix systems. I set up the daily backup-to-the-
DR-site routine and developed a procedure. We tested it a while back
and it worked fine. We knew the storm was coming so I checked things
again right before.

We recovered everything save perhaps less than one day's worth of
attachments (which we deemed acceptable ahead of time).

But the VAXes I mentioned were not missed, I'm sorry to say (except
for my needing to do one manual AOT update.) (The Financial Crisis
killed my trading desks, which is what the VAXes were used for. Nice.)

> > I don't know. People used to brag of long uptimes here.
>
> Yep.  I thought it was cool, too.  Then I thought of the implications
> of what it meant.
>
> > Usage of these systems has been stable to the nth degree. No hardware
> > changes. No software changes. Almost zero use and not in the least bit
> > critical.
>
> Uh-huh.

I said they're not the least bit critical.

> Have you not noticed the flurry of reboot problems with some subset of
> Alpha systems?

Nope. I don't read much in this group anymore. I just peek in from
time to time. And these are VAXes, not Alphas. (^_^)

> If you don't test it, you can't be sure it'll work.

That's okay if it doesn't work in a disaster. It just makes it a
little harder for me to update the AOT when the big boss sends me a
new spreadsheet. No data is lost. All the important data on all of the
VAXes are backed up on tape at Recall and on disks across "The Pond"
in London. And most of it's over 7 years old, which we no longer need
to keep. So I'm okay.

>
> > I've been the only user on the first box during this time period. I
> > use it for occasional tasks that could be done elsewhere if really
> > needed. OK, some monitoring for a short while, which was also stable.
>
> > No one's used the second box at all for almost as long.
>
> > Don't judge without all the facts.
>
> So.  Will your box reboot automatically?  Are you feeling lucky?  Do
> you have all the facts?

No. The battery is dead, so it will ask for the time. Also, I don't
need it to come up automatically.

Yes. (These are VAX systems!) OK, my primary worries here are that the
power supply might go kablooey. I've lost a few, given we had as many
as 40 MicroVAXes on line in my early days at the company. And a
handful of disk drives bit the dust. And we had local backup systems
and across-the-Pond DR systems, and that saved us a couple of times.)
But I have the data backed up, and I have lots of spare disks and
power supplies from the several dozen VAXes we have sitting around. I
believe I'm okay.

Yes (see above).

>
> Uptime looks great.  On paper.  Then you slam into reality.  Mistakes
> happen.  Latent bugs become less-than-latent.  That's why we test.

Agreed. You may also need to AUTOGEN once in a while.

OK, I'm running VMS 6.2 with all relevant ECO kits applied. Can you
give an example of a latent bug that might hit me? I have no apps
running. I just use my DCL script once in a while and do an occasional
backup. Thanks!

>
> --
> Pure Personal Opinion | HoffmanLabs LLC

AEF



More information about the Info-vax mailing list