[Info-vax] Long uptime cut short by Hurricane Sandy
David Froble
davef at tsoft-inc.com
Fri Jan 25 23:28:51 EST 2013
AEF wrote:
> On Jan 25, 2:48 pm, Stephen Hoffman <seaoh... at hoffmanlabs.invalid>
> wrote:
>> On 2013-01-25 19:31:20 +0000, AEF said:
>>
>>> On Jan 25, 2:47 pm, Stephen Hoffman <seaoh... at hoffmanlabs.invalid>
>>> wrote:
>>>> On 2013-01-25 18:34:48 +0000, Bill Gunshannon said:
>>>>> And less than a month ago we heard right here:
>>>>> >> Long server uptimes are the antithesis of testing.
>>>>> Go figure....
>>>> And you'll hear it again. That, and the benefits of testing your DT.
>>> DT?
>> Disaster Tolerance.
>
> Hi Hoff!
>
> Uh, I missed this part. There is no DR for these systems. There
> doesn't need to be. When Hurricane Sandy hit, the building lost power.
> We didn't get stable power back until about Jan 4. Nobody missed the
> VAXes except for me.
>
> I did have to manually update our Application Owner Table once, but
> that's my problem and I'm okay with it. And that's pretty much all I
> use it for. I use it for this a few times a year. So there's no need
> to do any Disaster Recovery setup, testing, etc. (I wrote some DCL to
> convert some Excel exports to wiki format to post on our Confluence
> wiki. I have to discard extra columns, massage the data and such, and
> output it in wiki format.) The boss wouldn't have allowed me to do any
> DR setup for this anyway. It's not worth it.
>
> Now, I *did* do DR work for the apps I'm responsible for: JIRA and
> Confluence. They run on Unix systems. I set up the daily backup-to-the-
> DR-site routine and developed a procedure. We tested it a while back
> and it worked fine. We knew the storm was coming so I checked things
> again right before.
>
> We recovered everything save perhaps less than one day's worth of
> attachments (which we deemed acceptable ahead of time).
>
> But the VAXes I mentioned were not missed, I'm sorry to say (except
> for my needing to do one manual AOT update.) (The Financial Crisis
> killed my trading desks, which is what the VAXes were used for. Nice.)
>
>>> I don't know. People used to brag of long uptimes here.
>> Yep. I thought it was cool, too. Then I thought of the implications
>> of what it meant.
>>
>>> Usage of these systems has been stable to the nth degree. No hardware
>>> changes. No software changes. Almost zero use and not in the least bit
>>> critical.
>> Uh-huh.
>
> I said they're not the least bit critical.
>
>> Have you not noticed the flurry of reboot problems with some subset of
>> Alpha systems?
>
> Nope. I don't read much in this group anymore. I just peek in from
> time to time. And these are VAXes, not Alphas. (^_^)
>
>> If you don't test it, you can't be sure it'll work.
>
> That's okay if it doesn't work in a disaster. It just makes it a
> little harder for me to update the AOT when the big boss sends me a
> new spreadsheet. No data is lost. All the important data on all of the
> VAXes are backed up on tape at Recall and on disks across "The Pond"
> in London. And most of it's over 7 years old, which we no longer need
> to keep. So I'm okay.
>
>>> I've been the only user on the first box during this time period. I
>>> use it for occasional tasks that could be done elsewhere if really
>>> needed. OK, some monitoring for a short while, which was also stable.
>>> No one's used the second box at all for almost as long.
>>> Don't judge without all the facts.
>> So. Will your box reboot automatically? Are you feeling lucky? Do
>> you have all the facts?
>
> No. The battery is dead, so it will ask for the time. Also, I don't
> need it to come up automatically.
>
> Yes. (These are VAX systems!) OK, my primary worries here are that the
> power supply might go kablooey. I've lost a few, given we had as many
> as 40 MicroVAXes on line in my early days at the company. And a
> handful of disk drives bit the dust. And we had local backup systems
> and across-the-Pond DR systems, and that saved us a couple of times.)
> But I have the data backed up, and I have lots of spare disks and
> power supplies from the several dozen VAXes we have sitting around. I
> believe I'm okay.
>
> Yes (see above).
>
>> Uptime looks great. On paper. Then you slam into reality. Mistakes
>> happen. Latent bugs become less-than-latent. That's why we test.
>
> Agreed. You may also need to AUTOGEN once in a while.
>
> OK, I'm running VMS 6.2 with all relevant ECO kits applied. Can you
> give an example of a latent bug that might hit me? I have no apps
> running. I just use my DCL script once in a while and do an occasional
> backup. Thanks!
>
>> --
>> Pure Personal Opinion | HoffmanLabs LLC
>
> AEF
I used to have my customers re-boot at least once a month. It wasn't
needed, but it didn't hurt.
Uptime is basically an ego trip. Thing is, VMS can do it, some others
can't. Weendoze can if you don't run anything.
If you got a MicroVAX 3100 model 98 sitting around that you're not fond
of, I could give it a good home.
More information about the Info-vax
mailing list