[Info-vax] Bouncing disk packs!

Galen no_email at invalid.invalid
Tue Oct 18 09:23:53 EDT 2022


chris <chris-nospam at tridac.net> wrote:
> Had a single RA60 in the lab here for a while, Extremely heavy, just
> managed to lift it and sounded like a gas turbine spinning up. Power
> consumption around 400 watts for a single drive, 205Mb packs. Still
> have packs, but no way to read them anymore...
> 
> Chris
> 
 At LMSC in Sunnyvale we had a 785 that booted from one of two RM05 drives.
The system disk was (we thought) backed up several times a week to several
tape volumes, and periodically to one of two other RM05 packs) which was
used for the next boot), during the overnight shift. The regular operator
did the backups, while their shift supervisor was responsible for regularly
ensuring they did that job and that everything was recorded in the shift
logbook. 

I Early on Thanksgiving Day an operator who was having back trouble DROPPED
THE NEW COPY of the system pack on the floor, then TRIED TO BOOT FROM IT!!!
Of course this crashed the heads into the media, destroying both. (It
evidently never occurred to the operator that anything might be damaged.)

When that boot failed, the operator proceeded to load the previous system
pack into the same drive, crashing that pack.

When that failed, the shift supervisor (somehow unaware of the pack bounce)
was called on, and attempted to restore a tape backup and in the process
(inevitably) crashed our last remaining RM05 system pack.

At this point the supervisor called me and my senior system manager (our
lead for DEC systems). By the time We both got there, our DEC FSE was on
site as well, and the whole story of the pack and drive crashes had
emerged. The FSE repaired the drive and brought us a new pack from DEC
Santa Clara. When I went to load the first system backup tape volume, I saw
from its label that it several months old. I was also unable to find the
remaining volumes (logged in the logbook) in our tape library. (Memory here
is unclear, but I suppose that, being several old themselves, they had been
recycled for some other purpose, probably to back up the user pack.) We
also found that the shift supervisor had not  run ANY other tape backups in
months!!

We began a fresh VMS install (including multiple DEC OS and layered product
updates) and called in the application maintainer to help us with
reinstalling his application. We had to reconstruct months lot of the
system and application environment (system and layered product startup
files, batch and print queues, etc.) from his and our memory. 

In the end, we spent a straight 26 hours on site that holiday weekend,
before everything was up and the application’s input backlog (via KCT32
running our own special firmware) was being worked off.

Even in this imperfectly remembered horror story of my longest day as a VMS
system engineer, many weak points in our backup and operating procedures
are visible. Less visible (and harder to reconstruct) are the errors in the
operations department procedures and management, and, yes, even in overall
systems engineering (my team’s department) that allowed such flawed
procedures to evolve in the first place.



More information about the Info-vax mailing list