[Info-vax] MACHINECHK on my XP900...
Jan-Erik Soderholm
jan-erik.soderholm at telia.com
Sun Feb 13 11:32:08 EST 2011
Jan-Erik Soderholm wrote 2011-02-13 17:07:
> Hans Vlems wrote 2011-02-13 10:31:
>> On Feb 12, 5:57 pm, Jan-Erik Soderholm<jan-erik.soderh... at telia.com>
>> wrote:
>>> Hi.
>>>
>>> I have som e trouble with my XP900 466 MHz system.
>>>
>>> Currently, when powered on, it boots (VMS 8.3) and runs
>>> for 5-10 minutes, then I get this on the console :
>>>
>>> --------------------------------------------------------------------------
>>>
>>> **** OpenVMS Alpha Operating System V8.3 - BUGCHECK ****
>>>
>>> ** Bugcheck code = 00000215: MACHINECHK, Machine check while in kernel mode
>>> ** Crash CPU: 00000000 Primary CPU: 00000000 Node Name: OSSBY1
>>> ** Supported CPU count: 00000001
>>> ** Active CPUs: 00000000.00000001
>>> ** Current Process: NULL
>>> ** Current PSB ID: 00000001
>>> ** Image Name:
>>>
>>> ** Dumping error log buffers to HBVS unit 0
>>>
>>> **** No supported device(s) found in DUMP_DEV
>>> **** No DUMP_DEV devices found
>>> **** Attempting to write the crash dump to the system disk
>>>
>>> --------------------------------------------------------------------------
>>>
>>> Before I begin fault-tracing, I thought I'd ask if there
>>> is anything in that message that "sticks out" ?
>>>
>>> I had the box opened before this started and I might have
>>> touched some RAM module or something like that. I do not know.
>>>
>>> Is the code = 00000215 trying to tell me something important ? :-)
>>>
>>> I was doing nohting in VMS, just booted and waited for the crash.
>>>
>>> Jan-Erik.
>>
>> Good morning Jan-Erik,
>> very likely the code is trying to tell you/us something, but very few
>> people speak that language these days....
>> As you suggested, reseating modules (memory modules, cpu board and pci
>> controllers) is a good start.
>> Perhaps one of the memory modules has developed a hardware problem, so
>> reducing the memory is an option.
>> The system takes 5-10 minutes to crash, so it is not a straightforward
>> hardware problem.
>> If the system isn't doing anything then we can assume that there is no
>> VMS related problem, right?
>> So it may be temperature related, or an intermittent hardware
>> problem.
>> I never saw an XP900 (just an XP1000) but if the cpu has a fan on top
>> of it, check whether it rotates freely.
>> Other than that, strip the system to its minimum configuration (cd,
>> cpu and minimal memory) and run VMS off cd.
>> If nothing happens, add more hardware until the problem appears again.
>> If it does, well, I would defy Murphy and suspect the cheapest
>> components: memory ;-)
>> Does the XP900 have a memtest command in nvram?
>> Hans
>
>
>
> Hi again and thanks to those respodning.
> Yes, there is a "memtest" command, but I can't make it do anything
> (I think, it silently returns to the >>> prompt.
>
> I found something else. SHOW POWER gives this output :
>
>>>> show power
>
> Status
> Power Supply good
> System Fan/PCI Fan good
> CPU Fan good
> Temperature good
>
> Current ambient temperature is 56 degrees C
> System shutdown temperature is set to 60 degrees C
>
> 8 Environmental events are logged in nvram
> Do you want to view the events? (Y/<N>) y
>
> Total Environmental Events: 8 (8 logged)
>
> 1 FEB 11 6:44 Temperature Failure
> 2 FEB 12 16:08 Temperature Failure
> 3 FEB 12 16:15 Temperature Failure
> 4 FEB 12 16:19 Temperature Failure
> 5 FEB 12 16:34 Temperature Failure
> 6 FEB 13 16:01 Temperature Failure
> 7 FEB 13 16:11 Temperature Failure
> 8 FEB 13 16:15 Temperature Failure
>
>
> These timestamps seems to be the same as the crasches
> I've had.
>
> I also saw that while powering on I got :
> "System Temperature is 59 degrees C".
>
> The system has been powered off during the night,
> so *it seems* as something is weird with the temp
> measurement !?
>
> I have as a quick workaround done SET SHUTDOWN_TEMP 70
> and we'll see if it keeps running longer.
>
> Has anyone seen a temp-sensor gone bad in a DS10, XP900 ?
>
> Jan-Erik.
This is a very similar problem description :
http://forums13.itrc.hp.com/service/forums/questionanswer.do?threadId=1038460
This page talkes about temp sensor problems on DS10 :
http://h30097.www3.hp.com/docs/updates/V51B/html/ar01s06.html
Mabe time to upgrade to a newer Alpha... :-)
More information about the Info-vax
mailing list