[Info-vax] MACHINECHK on my XP900...
Jan-Erik Soderholm
jan-erik.soderholm at telia.com
Sun Feb 13 11:07:05 EST 2011
Hans Vlems wrote 2011-02-13 10:31:
> On Feb 12, 5:57 pm, Jan-Erik Soderholm<jan-erik.soderh... at telia.com>
> wrote:
>> Hi.
>>
>> I have som e trouble with my XP900 466 MHz system.
>>
>> Currently, when powered on, it boots (VMS 8.3) and runs
>> for 5-10 minutes, then I get this on the console :
>>
>> --------------------------------------------------------------------------
>>
>> **** OpenVMS Alpha Operating System V8.3 - BUGCHECK ****
>>
>> ** Bugcheck code = 00000215: MACHINECHK, Machine check while in kernel mode
>> ** Crash CPU: 00000000 Primary CPU: 00000000 Node Name: OSSBY1
>> ** Supported CPU count: 00000001
>> ** Active CPUs: 00000000.00000001
>> ** Current Process: NULL
>> ** Current PSB ID: 00000001
>> ** Image Name:
>>
>> ** Dumping error log buffers to HBVS unit 0
>>
>> **** No supported device(s) found in DUMP_DEV
>> **** No DUMP_DEV devices found
>> **** Attempting to write the crash dump to the system disk
>>
>> --------------------------------------------------------------------------
>>
>> Before I begin fault-tracing, I thought I'd ask if there
>> is anything in that message that "sticks out" ?
>>
>> I had the box opened before this started and I might have
>> touched some RAM module or something like that. I do not know.
>>
>> Is the code = 00000215 trying to tell me something important ? :-)
>>
>> I was doing nohting in VMS, just booted and waited for the crash.
>>
>> Jan-Erik.
>
> Good morning Jan-Erik,
> very likely the code is trying to tell you/us something, but very few
> people speak that language these days....
> As you suggested, reseating modules (memory modules, cpu board and pci
> controllers) is a good start.
> Perhaps one of the memory modules has developed a hardware problem, so
> reducing the memory is an option.
> The system takes 5-10 minutes to crash, so it is not a straightforward
> hardware problem.
> If the system isn't doing anything then we can assume that there is no
> VMS related problem, right?
> So it may be temperature related, or an intermittent hardware
> problem.
> I never saw an XP900 (just an XP1000) but if the cpu has a fan on top
> of it, check whether it rotates freely.
> Other than that, strip the system to its minimum configuration (cd,
> cpu and minimal memory) and run VMS off cd.
> If nothing happens, add more hardware until the problem appears again.
> If it does, well, I would defy Murphy and suspect the cheapest
> components: memory ;-)
> Does the XP900 have a memtest command in nvram?
> Hans
Hi again and thanks to those respodning.
Yes, there is a "memtest" command, but I can't make it do anything
(I think, it silently returns to the >>> prompt.
I found something else. SHOW POWER gives this output :
>>>show power
Status
Power Supply good
System Fan/PCI Fan good
CPU Fan good
Temperature good
Current ambient temperature is 56 degrees C
System shutdown temperature is set to 60 degrees C
8 Environmental events are logged in nvram
Do you want to view the events? (Y/<N>) y
Total Environmental Events: 8 (8 logged)
1 FEB 11 6:44 Temperature Failure
2 FEB 12 16:08 Temperature Failure
3 FEB 12 16:15 Temperature Failure
4 FEB 12 16:19 Temperature Failure
5 FEB 12 16:34 Temperature Failure
6 FEB 13 16:01 Temperature Failure
7 FEB 13 16:11 Temperature Failure
8 FEB 13 16:15 Temperature Failure
These timestamps seems to be the same as the crasches
I've had.
I also saw that while powering on I got :
"System Temperature is 59 degrees C".
The system has been powered off during the night,
so *it seems* as something is weird with the temp
measurement !?
I have as a quick workaround done SET SHUTDOWN_TEMP 70
and we'll see if it keeps running longer.
Has anyone seen a temp-sensor gone bad in a DS10, XP900 ?
Jan-Erik.
More information about the Info-vax
mailing list