[Info-vax] Long uptime cut short by Hurricane Sandy
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Sat Jan 26 12:09:04 EST 2013
On 2013-01-26 00:17:28 +0000, AEF said:
> On Jan 25, 2:48 pm, Stephen Hoffman <seaoh... at hoffmanlabs.invalid>
> wrote:
>> Disaster Tolerance.
>
> Uh, I missed this part. There is no DR for these systems. There
> doesn't need to be. When Hurricane Sandy hit, the building lost power.
> We didn't get stable power back until about Jan 4. Nobody missed the
> VAXes except for me.
I had inferred that from your response.
Given you're not swapping batteries (~$25 for a Dallas, or less for a
coin cell, or probably ~$5 for a NiCd pack), the VAX boxes aren't
uptime-critical production servers. The boxes are effectively personal
computers or simple servers, but running a "weird" operating system on
"weird" hardware. Yes, these small computers (and even iPads) can and
do run critical apps, but seldom with requirements for redundancy and
continuous access.
> OK, I'm running VMS 6.2 with all relevant ECO kits applied. Can you
> give an example of a latent bug that might hit me? I have no apps
> running. I just use my DCL script once in a while and do an occasional
> backup. Thanks!
You have some fairly non-critical[1] computer systems here. While
these are VAX boxes, personal computers and tablets and small commodity
servers are more common for these roles and these tasks in recent
times, and these are generally cheaper to power and program and deal
with.
If these VAX systems were within my purview, I'd look to either VAX
emulation or to port the DCL procedures, or both. Either porting to
newer VMS boxes and newer (and probably used) Itanium hardware, or
porting (at least the data) all the way to commodity platform hardware,
and to pursue reasonable opportunities to consolidate onto fewer boxes.
Fix the short-term problems at minimal cost and effort (when that
effort becomes necessary), and then remove VAX hardware (via
consolidation onto fewer VAX boxes and via emulation, or via a platform
port), and potentially remove VMS entirely.
Given the problems you're reporting with the hardware, I'd prototype
for and cost for consolidation, for VAX emulation, and the two sorts of
ports, and for continued operations until the port is completed.
That's probably already the long-term management plan[2] for these
boxes anyway, though probably won't go forward with any priority until
the existing VAX hardware doesn't meet your already apparently minimal
requirements.
Depending on the nature of use here — yours is apparently fairly
low-grade production usage — and pending wholesale replacement of the
existing VAX hardware, you might monitor for hardware errors being
logged, monitor the error logs for memory errors, and consider
preemptive replacements of at least the hard disks, and possibly
migrating the storage out into SBBs or other analogous external
shelves. I'd probably also not bother with periodic system-wide
backups here, probably not even backing up the code; just the data.
Back the non-volatile stuff once a year and after changes[3], and keep
the periodic backups of the data off of the box. Given the existing
minimal investment in these servers, well, who cares what happens here?
Keep the data, and be prepared carry the scrap out when the box dies.
————
[1] The DCL application(s) might be critical, but there are clearly
also relaxed requirements around continuous access, and hardware
maintenance.
[2] Variously also known as "run it into the ground, then replace it."
[3] Possibly even performed automatically as part of SYSHUTDWN.COM or
as a site-local wrapper around SHUTDOWN.COM, or as a part of
SYSHUTDWN_0010.COM on newer releases.
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list