[Info-vax] Alphaserver ES47: Suspected broken CPU, unable to stop/cpu 2

Stephen Hoffman seaohveh at hoffmanlabs.invalid
Wed Aug 8 09:43:42 EDT 2018


On 2018-08-08 06:34:54 +0000, Robin Schrievers said:

> One of our Alphaserver ES47's is showing spontaneous crashes since a 
> few days. As soon as we start queues and make the box work, it will 
> crash within the hour with a machinecheck.
> ...
> Any suggestions would be greatly appreciated

A remote system with no local hardware support, a very down-revision 
and unsupported version of OpenVMS Alpha, and no spare server system 
available for a wholesale failover?   That's.... auspicious.  Might 
want to acquire an ES47 and air-freight that to the location, swap the 
storage and get that going.

Could well be memory, processor, interconnect or who-knows-what.  I've 
had cables fail on Marvel-class boxes, for instance.  Without access to 
the hardware diagnostics and particularly the error log entries and 
based solely on the not-always-helpful OpenVMS footprint, I'd guess bad 
CPU or maybe bad memory.  (The OpenVMS ELV tool and the Marvel 
diagnostics documentation and the Marvel server gremlins don't always 
agree on what's actually happening, either.)

Things to try?
Disable everything except CPU 0 and try again.  MBM> SET CPU_ENABLED 00000001

Disable the possibly-failing Duo, as the failure of one CPU can cause 
problems for the other.  MBM> SET CPU_ENABLED FFFFFCFF

Might also need to try reconfiguring the memory.  Haven't tried 
partitioning the system on an ES47, but that's one way that a 
Marvel-class box can re-organize its memory.

If you can get somebody on-site, re-seat the processors, risers and 
memory.  Issues with Marvel-class boxes can sometimes be cured with 
that "simple" expedient.

If not and otherwise, somebody is going to need to run the fault 
diagnostics and the ELV error log reports and isolate the hardware 
error, and swap some boards.  Or as can happen with these sorts of 
servers, swap around or swap out the boards most likely involved, and 
see if the gremlins migrate.

Try scrounging a copy of the MBM CLI manual, too: "AlphaServer 
ES47/ES80/GS1280 Server Management, Command Line Interface CLI 
Reference, Version 3.0 October 2003, was the last version around.  The 
filename used to be "cli_reference_v3.pdf".  That particular manual 
is... hard to find.

BTW, ftp.hpe.com has gone https-only (or maybe https and sftp, didn't 
try that), and now you have to know the filename and full path of the 
target file to fetch anything from that server.


-- 
Pure Personal Opinion | HoffmanLabs LLC 




More information about the Info-vax mailing list