[Info-vax] Alphaserver ES47: Suspected broken CPU, unable to stop/cpu 2
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Wed Aug 8 09:43:42 EDT 2018
On 2018-08-08 06:34:54 +0000, Robin Schrievers said:
> One of our Alphaserver ES47's is showing spontaneous crashes since a
> few days. As soon as we start queues and make the box work, it will
> crash within the hour with a machinecheck.
> ...
> Any suggestions would be greatly appreciated
A remote system with no local hardware support, a very down-revision
and unsupported version of OpenVMS Alpha, and no spare server system
available for a wholesale failover? That's.... auspicious. Might
want to acquire an ES47 and air-freight that to the location, swap the
storage and get that going.
Could well be memory, processor, interconnect or who-knows-what. I've
had cables fail on Marvel-class boxes, for instance. Without access to
the hardware diagnostics and particularly the error log entries and
based solely on the not-always-helpful OpenVMS footprint, I'd guess bad
CPU or maybe bad memory. (The OpenVMS ELV tool and the Marvel
diagnostics documentation and the Marvel server gremlins don't always
agree on what's actually happening, either.)
Things to try?
Disable everything except CPU 0 and try again. MBM> SET CPU_ENABLED 00000001
Disable the possibly-failing Duo, as the failure of one CPU can cause
problems for the other. MBM> SET CPU_ENABLED FFFFFCFF
Might also need to try reconfiguring the memory. Haven't tried
partitioning the system on an ES47, but that's one way that a
Marvel-class box can re-organize its memory.
If you can get somebody on-site, re-seat the processors, risers and
memory. Issues with Marvel-class boxes can sometimes be cured with
that "simple" expedient.
If not and otherwise, somebody is going to need to run the fault
diagnostics and the ELV error log reports and isolate the hardware
error, and swap some boards. Or as can happen with these sorts of
servers, swap around or swap out the boards most likely involved, and
see if the gremlins migrate.
Try scrounging a copy of the MBM CLI manual, too: "AlphaServer
ES47/ES80/GS1280 Server Management, Command Line Interface CLI
Reference, Version 3.0 October 2003, was the last version around. The
filename used to be "cli_reference_v3.pdf". That particular manual
is... hard to find.
BTW, ftp.hpe.com has gone https-only (or maybe https and sftp, didn't
try that), and now you have to know the filename and full path of the
target file to fetch anything from that server.
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list