[Info-vax] MACHINECHK on my XP900...
Jan-Erik Soderholm
jan-erik.soderholm at telia.com
Sun Feb 13 11:48:21 EST 2011
Jan-Erik Soderholm wrote 2011-02-13 17:32:
> Jan-Erik Soderholm wrote 2011-02-13 17:07:
>> Hans Vlems wrote 2011-02-13 10:31:
>>> On Feb 12, 5:57 pm, Jan-Erik Soderholm<jan-erik.soderh... at telia.com>
>>> wrote:
>>>> Hi.
>>>>
>>>> I have som e trouble with my XP900 466 MHz system.
>>>>
>>>> Currently, when powered on, it boots (VMS 8.3) and runs
>>>> for 5-10 minutes, then I get this on the console :
>>>>
>>>> --------------------------------------------------------------------------
>>>>
>>>> **** OpenVMS Alpha Operating System V8.3 - BUGCHECK ****
>>>>
>>>> ** Bugcheck code = 00000215: MACHINECHK, Machine check while in kernel
>>>> mode
>>>> ** Crash CPU: 00000000 Primary CPU: 00000000 Node Name: OSSBY1
>>>> ** Supported CPU count: 00000001
>>>> ** Active CPUs: 00000000.00000001
>>>> ** Current Process: NULL
>>>> ** Current PSB ID: 00000001
>>>> ** Image Name:
>>>>
>>>> ** Dumping error log buffers to HBVS unit 0
>>>>
>>>> **** No supported device(s) found in DUMP_DEV
>>>> **** No DUMP_DEV devices found
>>>> **** Attempting to write the crash dump to the system disk
>>>>
>>>> --------------------------------------------------------------------------
>>>>
>>>> Before I begin fault-tracing, I thought I'd ask if there
>>>> is anything in that message that "sticks out" ?
>>>>
>>>> I had the box opened before this started and I might have
>>>> touched some RAM module or something like that. I do not know.
>>>>
>>>> Is the code = 00000215 trying to tell me something important ? :-)
>>>>
>>>> I was doing nohting in VMS, just booted and waited for the crash.
>>>>
>>>> Jan-Erik.
>>>
>>> Good morning Jan-Erik,
>>> very likely the code is trying to tell you/us something, but very few
>>> people speak that language these days....
>>> As you suggested, reseating modules (memory modules, cpu board and pci
>>> controllers) is a good start.
>>> Perhaps one of the memory modules has developed a hardware problem, so
>>> reducing the memory is an option.
>>> The system takes 5-10 minutes to crash, so it is not a straightforward
>>> hardware problem.
>>> If the system isn't doing anything then we can assume that there is no
>>> VMS related problem, right?
>>> So it may be temperature related, or an intermittent hardware
>>> problem.
>>> I never saw an XP900 (just an XP1000) but if the cpu has a fan on top
>>> of it, check whether it rotates freely.
>>> Other than that, strip the system to its minimum configuration (cd,
>>> cpu and minimal memory) and run VMS off cd.
>>> If nothing happens, add more hardware until the problem appears again.
>>> If it does, well, I would defy Murphy and suspect the cheapest
>>> components: memory ;-)
>>> Does the XP900 have a memtest command in nvram?
>>> Hans
>>
>>
>>
>> Hi again and thanks to those respodning.
>> Yes, there is a "memtest" command, but I can't make it do anything
>> (I think, it silently returns to the >>> prompt.
>>
>> I found something else. SHOW POWER gives this output :
>>
>>>>> show power
>>
>> Status
>> Power Supply good
>> System Fan/PCI Fan good
>> CPU Fan good
>> Temperature good
>>
>> Current ambient temperature is 56 degrees C
>> System shutdown temperature is set to 60 degrees C
>>
>> 8 Environmental events are logged in nvram
>> Do you want to view the events? (Y/<N>) y
>>
>> Total Environmental Events: 8 (8 logged)
>>
>> 1 FEB 11 6:44 Temperature Failure
>> 2 FEB 12 16:08 Temperature Failure
>> 3 FEB 12 16:15 Temperature Failure
>> 4 FEB 12 16:19 Temperature Failure
>> 5 FEB 12 16:34 Temperature Failure
>> 6 FEB 13 16:01 Temperature Failure
>> 7 FEB 13 16:11 Temperature Failure
>> 8 FEB 13 16:15 Temperature Failure
>>
>>
>> These timestamps seems to be the same as the crasches
>> I've had.
>>
>> I also saw that while powering on I got :
>> "System Temperature is 59 degrees C".
>>
>> The system has been powered off during the night,
>> so *it seems* as something is weird with the temp
>> measurement !?
>>
>> I have as a quick workaround done SET SHUTDOWN_TEMP 70
>> and we'll see if it keeps running longer.
>>
>> Has anyone seen a temp-sensor gone bad in a DS10, XP900 ?
>>
>> Jan-Erik.
>
> This is a very similar problem description :
> http://forums13.itrc.hp.com/service/forums/questionanswer.do?threadId=1038460
>
> This page talkes about temp sensor problems on DS10 :
> http://h30097.www3.hp.com/docs/updates/V51B/html/ar01s06.html
>
> Mabe time to upgrade to a newer Alpha... :-)
>
Sorry for yet another post on this issue... :-)
The temp can be read fron within VMS, and it gives another value:
$ temp = f$getsyi("temperature_vector")
$ sh sym temp
TEMP = "FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF38"
$
The last two chars is the temp in degC, 38 is a rather
sensible value.
I made a quick shutdown to re-check the value in console mode
and it still says "Current ambient temperature is 56 degrees C".
Weird...
More information about the Info-vax
mailing list