[Info-vax] Long uptime cut short by Hurricane Sandy

Stephen Hoffman seaohveh at hoffmanlabs.invalid
Sat Jan 26 12:09:04 EST 2013


On 2013-01-26 00:17:28 +0000, AEF said:

> On Jan 25, 2:48 pm, Stephen Hoffman <seaoh... at hoffmanlabs.invalid>
> wrote:
>> Disaster Tolerance.
> 
> Uh, I missed this part. There is no DR for these systems. There
> doesn't need to be. When Hurricane Sandy hit, the building lost power.
> We didn't get stable power back until about Jan 4. Nobody missed the
> VAXes except for me.

I had inferred that from your response.

Given you're not swapping batteries (~$25 for a Dallas, or less for a 
coin cell, or probably ~$5 for a NiCd pack), the VAX boxes aren't 
uptime-critical production servers.  The boxes are effectively personal 
computers or simple servers, but running a "weird" operating system on 
"weird" hardware.  Yes, these small computers (and even iPads) can and 
do run critical apps, but seldom with requirements for redundancy and 
continuous access.


> OK, I'm running VMS 6.2 with all relevant ECO kits applied. Can you
> give an example of a latent bug that might hit me? I have no apps
> running. I just use my DCL script once in a while and do an occasional
> backup. Thanks!

You have some fairly non-critical[1] computer systems here.  While 
these are VAX boxes, personal computers and tablets and small commodity 
servers are more common for these roles and these tasks in recent 
times, and these are generally cheaper to power and program and deal 
with.

If these VAX systems were within my purview, I'd look to either VAX 
emulation or to port the DCL procedures, or both.   Either porting to 
newer VMS boxes and newer (and probably used) Itanium hardware, or 
porting (at least the data) all the way to commodity platform hardware, 
and to pursue reasonable opportunities to consolidate onto fewer boxes.

Fix the short-term problems at minimal cost and effort (when that 
effort becomes necessary), and then remove VAX hardware (via 
consolidation onto fewer VAX boxes and via emulation, or via a platform 
port), and potentially remove VMS entirely.

Given the problems you're reporting with the hardware, I'd prototype 
for and cost for consolidation, for VAX emulation, and the two sorts of 
ports, and for continued operations until the port is completed.  
That's probably already the long-term management plan[2] for these 
boxes anyway, though probably won't go forward with any priority until 
the existing VAX hardware doesn't meet your already apparently minimal 
requirements.

Depending on the nature of use here — yours is apparently fairly 
low-grade production usage — and pending wholesale replacement of the 
existing VAX hardware, you might monitor for hardware errors being 
logged, monitor the error logs for memory errors, and consider 
preemptive replacements of at least the hard disks, and possibly 
migrating the storage out into SBBs or other analogous external 
shelves.  I'd probably also not bother with periodic system-wide 
backups here, probably not even backing up the code; just the data.  
Back the non-volatile stuff once a year and after changes[3], and keep 
the periodic backups of the data off of the box.  Given the existing 
minimal investment in these servers, well, who cares what happens here? 
 Keep the data, and be prepared carry the scrap out when the box dies.

————
[1] The DCL application(s) might be critical, but there are clearly 
also relaxed requirements around continuous access, and hardware 
maintenance.
[2] Variously also known as "run it into the ground, then replace it."
[3] Possibly even performed automatically as part of SYSHUTDWN.COM or 
as a site-local wrapper around SHUTDOWN.COM, or as a part of 
SYSHUTDWN_0010.COM on newer releases.

-- 
Pure Personal Opinion | HoffmanLabs LLC




More information about the Info-vax mailing list