[Info-vax] Alphaserver ES47: Suspected broken CPU, unable to stop/cpu 2
Robin Schrievers
robin.schrievers at meteogroup.com
Thu Aug 9 04:01:24 EDT 2018
meanwhile, Roy has been of great help doing a analysis of the errlog.sys file of the latest crash. This is the result:
Event: 2
Description: UnCorrectable System Event at Mon 6 Aug 2018 10:35:36 GMT+00:00 from THMG03
File: ./clue$errlog.sys
================================================================================
COMMON EVENT HEADER (CEH) V2.0
Event_Leader xFFFF FFFE
Header_Length 308
Event_Length 1,416
Header_Rev_Major 2
Header_Rev_Minor 0
OS_Type 2 -- OpenVMS
Hardware_Arch 4 -- Alpha
CEH_Vendor_ID 3,564 -- Hewlett-Packard Company
Hdwr_Sys_Type 39 -- hp AlphaServer EV7 Series
Logging_CPU 2 -- CPU Logging this Event
CPUs_In_Active_Set 0
Major_Class 27
Minor_Class 0
Entry_Type 660 -- UnCorrectable System Event
DSR_Msg_Num 2,030 -- AlphaServer ES47
Chip_Type 15 -- EV7 - 21364
CEH_Device 0
CEH_Device_ID_0 x0000 0000
CEH_Device_ID_1 x0000 0000
CEH_Device_ID_2 x0000 0000
Unique_ID_Count 27,300
Unique_ID_Prefix 0
Num_Strings 7
TLV Section of CEH
TLV_Sys_Serial_Num IE84200129
TLV_DSR_String hp AlphaServer ES47 7/1150
TLV_Time_as_Local Mon 6 Aug 2018 10:35:36 GMT+00:00
TLV_OS_Version V7.3-2
TLV_Computer_Name THMG03
Entry_Type 660
SysType39_Processing
START OF SUBPACKETS IN THIS EVENT
Logout Header Frame
Frame_ID_Flags x0000 0011
System_Type_ID[15:0]x11 GS1280 Series - 2 P Drawer
Recoverable_Flag[28]x0
Second_Error_Flag[30]x0
Retry_Flag[31] x0
CPU_Offset x0000 0060
CPU_Offset[31:0] x60
System_Offset x0000 0268
System_Offset[31:0] x268
PAL Specific Subpacket
Machine_Check_code x0000 0202
Mchk_Check_Code[31:0]x202 * 660 - SYSTEM DETECTED UNCORRECTABLE ERROR
Subpacket_Count x0000 0004
Subpkt_Count[31:0] x4
Processor_WHAMI x0000 0000 0000 0002
CPU_WHAMI[7:0] x2 CPU 2
RBOX_WHOAMI x0000 0000 0000 0002
Physical_CPU_Loc[7:0]x2 If 8P Configuration - Cabinet 0 - Drawer 1 - Module 0 - CPU 0
RBOX_INT x0000 0000 0880 0108 RBOX Interrupt Status
Z1CRD[3] x1 * A ZBOX1 CORRECTABLE ERROR INTERRUPT HAS OCCURRED
RPERF[8] x1 An RBOX performance counter interrupt has occurred
Z1UCE[27] x1 * A ZBOX1 UNCORRECTABLE ERROR INTERRUPT HAS OCCURRED
Exc_Addr x0000 0000 0003 D7F0 Exception Address Register
EXC_ADDR[63:0] x3 D7F0 Exception Address
Time_Stamp x0000 1208 060A 120C
Second[7:0] xC
Minute[15:8] x12
Hour[23:16] xA
Day[31:24] x6
Month[39:32] x8
Year[47:40] x12
Halt_Code_Reason x0000 0000 0000 1000
Halt_Reason_Code[15:0]x1000 * MACHINE CHECK CALL IN OPERATING SYSTEM
Processor Subpacket
I_STAT x0000 0025 0000 0000 I BOX Status Register
OVR[32:30] x4 ProfileMe Counter 0 Overcount
ICM[33] x0 ProfileMe Icache Miss
TRAP_TYPE[37:34] x9 See PMPC <14:0>
LS0[38] x0 ProfileMe Load - Mbox Load-Store Order Replay Trap
TRP[39] x0 ProfileMe Trap - Check Trap Type [3:0]
MSI[40] x0 ProfileMe Mispredict Trap
DC_STAT x0000 0000 0000 0000 Dcache Status Register
C_Addr x0000 07E8 FFB0 2000 Cbox Address
ERR_ADR[42:0] x7E8 FFB0 2000 Error Address
C_Syndrome_1 x0000 0000 0000 0000 Syndrome for Upper Quadword
C_Syndrome_1[8:0] x0
C_Syndrome_0 x0000 0000 0000 0000 Syndrome for Lower Quadword
C_STAT x0000 0000 0000 0000 Cbox Status
C_STAT[4:0] x0
C_STS x0000 0000 0000 0000 Cbox Block Status
C_STS[3:0] x0 Shared
MM_STAT x0000 0000 0000 00B0 Memory Management Status
OPCODE[9:4] xB OpCode that caused the ERROR
Exc_Addr x0000 0000 0003 D7F0 Exception Address Register
EXC_ADDR[63:0] x3 D7F0 Exception Address
IER_CM x0000 007E FFFF E018 Interrupt Enable and Current Mode
CM[4:3] x3 User Mode
ASTEN[13] x1 AST Interrupt Enable
SIEN[28:14] x7FFF Software Interrupt Enables
PCEN[30:29] x3 Performance Counter Interrupt Enable
CREN[31] x1 Correctable Read Error Interrupt Enable
EIEN[38:33] x3F External Interrupt Enables
ISUM x0000 0042 0000 0000 Interrupt Summary Register
EI[38:33] x21 External Interrupts
PAL_BASE x0000 0008 0003 0000 PAL Base Register
PAL_BASE[43:15] x10 0006 Base Physical Address for PAL Code
I_CTL xFFFF FEFA 0430 0386 Ibox Control
SPCE[0] x0 System Performance Counting Enable
IC_EN[2:1] x3 Icache Set Enable
SPE[5:3] x0 Super Page Mode Enable
RPM[6] x0 Reduced Page Mode
SDE[7] x1 Access to PAL Shadow Registers Enabled
SBE[9:8] x3 Stream Buffer Enable
BP_MODE[11:10] x0 Branch Prediction Mode
ST_WAIT_64K[20] x1 stWait Table cleared after 64K Cycles
MCHK_EN[21] x1 Machine Checks are Enabled
BIST_FAIL[23] x0 BIST has run Successfully
CHIP_ID[29:24] x4 Revision ID of CPU Chip
VPTB[47:30] x3 FBE8 Virtual Page Table Base
SEXT_VPTB_47[63:48] xFFFF Sign Extended VPTB <47>
Process_Context x0000 3D80 0000 01E4 Process Context
PPCE[1] x0 Process Performance Counting Enable
FPE[2] x1 Floating Point Enable
ASTER[8:5] xF AST Enable Register
ASTRR[12:9] x0 AST Request Register
ASN[46:39] x7B Address Space Number
CBOX_CTL x1600 0000 1004 A802 Cbox Control
PID[7:0] x2 Processor ID
PAGE_MIGR_FAST[9] x0 16 events between migration samples
CACHE_ISTM[13] x1 Cache istream fills in bcache
ENA_ECC[15] x1 ECC checking is enabled
ACC_CLUMP[17] x0 Local/Global access checks are for a single processor
LPACC[18] x1 Bypass local memory access checks
PRBQ_STXC_DIS[28] x1 PRBQ treats StoD_STxC's as StoD's
OCLA_ENA[63] x0 OCLA is disabled (copy of QBOX bit)
CBOX_STP_CTL x0000 0000 0000 0000 Cbox Stripe Control
STP[63:0] x0
CBOX_ACC_CTL x0000 0000 0000 0000 Cbox Access Control
ACC[63:0] x0
CBOX_LCL_SET xFFFF FFFF FFFF FFFF Cbox Local Processor Set
LCL[63:0] xFFFF FFFF FFFF FFFF
CBOX_GBL_SET x0000 0000 0000 0000 Cbox Global Processor Set
GBL[63:0] x0
BBOX_CTL x0000 0000 0001 C67F Bbox Control
SET_ENA[6:0] x7F L2 Cache Set Enable
EVICT_NEXT[10:8] x6 Evict Next Set
BC_STS_PAR_ENA[14] x1 L2 Cache sts Parity Check Enable
BC_TAG_PAR_ENA[15] x1 L2 Cache Tag Parity Check Enable
TTAG_PARITY_ENA[16] x1 Ttag Parity Check Enable
BBOX_ERR_STS x0000 0000 0000 0000 Bbox Error Status
BSTS_PAR[6:0] x0
BTAG_PAR[14:8] x0
TTAG_PAR[23:16] x0
BBOX_ERR_MASK[26:24]x0 No Error Occurred
BBOX_ERR_IDX x0000 0000 0003 FBC0 Bbox Error Index
IDX[17:6] xFEF L2 Cache locked / index of most recent Parity Error
CBOX_DDP_ERR_STS x0000 0000 0000 0000 Cbox Data Path Error Status
CBOX_ERR_MASK[2:0] x0 No Error Occurred
ERR_SET[5:3] x0 L2 Cache Set with Error
ERR_IDX[17:6] x0 L2 Cache Set Index with Error
ERR_SYN[28:20] x0
SIDE[29] x0 DP0 or DP1
BBOX_DAT_RMP x0000 0000 0000 0000 Bbox Data Remap Information
NE_REMAP[6:0] x0
NW_REMAP[14:8] x0
SW_REMAP[22:16] x0
SE_REMAP[30:24] x0
BTAG_BAD[31] x0 Btag BIST Passed
ZBOX Subpacket
ZBOX0_DRAM_ERR_Status1 x0000 0000
DIR_ERRSYN[10:5] x0
DAT_ERRSYN1[22:14] x0
DAT_ERRSYN0[31:23] x0
ZBOX0_DRAM_ERR_Status2 x0000 000B
TEMPCAL_CHAN[4:0] xB
TEMPCAL_DEV[9:5] x0
DAT_ERRSYN3[22:14] x0
DAT_ERRSYN2[31:23] x0
ZBOX0_DRAM_ERR_Status3 x0000 0000
ZBOX0_DRAM_ERROR_CTL x2082 133F
RAID_ON[24] x0 RAID channel is not in use
ZBOX0_DRAM_ERR_ADR x0000 0000
ERR_ADDR[28:0] x0
ZBOX0_DIFT_Timeout x8400 0000
ZBOX0_DRAM_MAPPER_CTL x2E21 CC8B
ZBOX0_FRC_ERR_ADR x0000 0001
ZBOX0_DIFT_ERR_Status x0000 0000
ZBOX1_DRAM_ERR_Status1 x0000 0000
DIR_ERRSYN[10:5] x0
DAT_ERRSYN1[22:14] x0
DAT_ERRSYN0[31:23] x0
ZBOX1_DRAM_ERR_Status2 x0800 0003
TEMPCAL_CHAN[4:0] x3
TEMPCAL_DEV[9:5] x0
DAT_ERRSYN3[22:14] x0
DAT_ERRSYN2[31:23] x10 OctaWord2 ECC Check Bit 4
ZBOX1_DRAM_ERR_Status3 x0000 2002
SGL[1] x1 * ONE OR TWO SINGLE BIT ECC ERRORS WERE DETECTED ON A MEMORY READ
MEO[13] x1 * A SECOND CORRECTABLE ERROR OCCURRED FOR WHICH NO PHYSICAL ADDRESS WAS LATCHED
ZBOX1_DRAM_ERROR_CTL x2082 133F
RAID_ON[24] x0 RAID channel is not in use
ZBOX1_DRAM_ERR_ADR x0000 0000
ERR_ADDR[28:0] x0
ZBOX1_DIFT_Timeout x8400 0000
ZBOX1_DRAM_MAPPER_CTL x2E21 CC8B
ZBOX1_FRC_ERR_ADR x0000 0000
ZBOX1_DIFT_ERR_Status x0000 0000
CBOX_CTL x1600 0000 1004 A802 Cbox Control
PID[7:0] x2 Processor ID
PAGE_MIGR_FAST[9] x0 16 events between migration samples
CACHE_ISTM[13] x1 Cache istream fills in bcache
ENA_ECC[15] x1 ECC checking is enabled
ACC_CLUMP[17] x0 Local/Global access checks are for a single processor
LPACC[18] x1 Bypass local memory access checks
PRBQ_STXC_DIS[28] x1 PRBQ treats StoD_STxC's as StoD's
OCLA_ENA[63] x0 OCLA is disabled (copy of QBOX bit)
CBOX_STP_CTL x0000 0000 0000 0000 Cbox Stripe Control
STP[63:0] x0
ZBOX0_ERROR_PA x0000 0008 0000 0000
ZBOX1_ERROR_PA x0000 0008 0000 0000
ZBOX0_ORED_SYNDROME x0000 0000 0000 0000
ZBOX1_ORED_SYNDROME x0000 0000 0000 0000
RBOX Subpacket
RBOX_CFG x0000 0000 C976 C6C1
SPD[3:0] x1 Pin Speed Ratio is 1.5 GClk Cycles Per Data Beat
PRI[4] x0 East/West is the primary axis
ADA[6] x1 Enable adaptive routing
ADB[7] x1 Enable adaptive buffers
BRO[12:9] x3 Broadcasts will be forwarded to the N & S IP ports
SYF[16:14] x3 SYNCH interval = 16K, period = 4M
RSE[17] x1 Target Reservations are enabled
STI[20:18] x5 Starvation interval = 1023
DRE[21] x1 Drain is enabled
DRI[24:22] x5 Drain Interval = 1023
TAS[26:25] x0 InvalAck Table A Select = N
TBS[28:27] x1 InvalAck Table B Select = S
TCS[30:29] x2 InvalAck Table C Select = E
IRW[31] x1 Writes to the router table will be ignored
RBOX_N_CFG x0000 0000 0000 3715 North IP port configuration
OE[0] x1 Port is enabled and driving output data
IE[2] x1 Port is enabled and is interpreting input port bits
BRO[6:3] x2 Broadcasts will be forwarded to the S IP port
BIL[8] x1 Broadcast Inval Limit = 16
HAE[9] x1 Receive HW ALERT is enabled
SAE[10] x1 Receive SW ALERT is enabled
SYC[11] x0 Receive SYNCH is disabled
CLC[12] x1 Fclk checking is enabled
ECC[13] x1 ECC checking/correction is enabled
UNI[15:14] x0 Unload pointer init value (for clock-forward reset)
FEM[19:18] x0 Normal operation
TUN[22:21] x0 Normal port operation
RBOX_S_CFG x0000 0000 0000 360D South IP port configuration
OE[0] x1 Port is enabled and driving output data
IE[2] x1 Port is enabled and is interpreting input port bits
BRO[6:3] x1 Broadcasts will be forwarded to the N IP port
BIL[8] x0 Broadcast Inval Limit = 8
HAE[9] x1 Receive HW ALERT is enabled
SAE[10] x1 Receive SW ALERT is enabled
SYC[11] x0 Receive SYNCH is disabled
CLC[12] x1 Fclk checking is enabled
ECC[13] x1 ECC checking/correction is enabled
UNI[15:14] x0 Unload pointer init value (for clock-forward reset)
FEM[19:18] x0 Normal operation
TUN[22:21] x0 Normal port operation
RBOX_E_CFG x0000 0000 0000 0618 East IP port configuration
OE[0] x0 Output port is disabled - messages destined to pass through this output port are "black-holed"
IE[2] x0 Port is disabled or has Black Holed - input port bits are being ignored
BRO[6:3] x3 Broadcasts will be forwarded to the S & N IP ports
BIL[8] x0 Broadcast Inval Limit = 8
HAE[9] x1 Receive HW ALERT is enabled
SAE[10] x1 Receive SW ALERT is enabled
SYC[11] x0 Receive SYNCH is disabled
CLC[12] x0 * FCLK CHECKING IS DISABLED
ECC[13] x0 * ECC CHECKING/CORRECTION IS DISABLED
UNI[15:14] x0 Unload pointer init value (for clock-forward reset)
FEM[19:18] x0 Normal operation
TUN[22:21] x0 Normal port operation
RBOX_W_CFG x0000 0000 0000 0618 West IP port configuration
OE[0] x0 Output port is disabled - messages destined to pass through this output port are "black-holed"
IE[2] x0 Port is disabled or has Black Holed - input port bits are being ignored
BRO[6:3] x3 Broadcasts will be forwarded to the S & N IP ports
BIL[8] x0 Broadcast Inval Limit = 8
HAE[9] x1 Receive HW ALERT is enabled
SAE[10] x1 Receive SW ALERT is enabled
SYC[11] x0 Receive SYNCH is disabled
CLC[12] x0 * FCLK CHECKING IS DISABLED
ECC[13] x0 * ECC CHECKING/CORRECTION IS DISABLED
UNI[15:14] x0 Unload pointer init value (for clock-forward reset)
FEM[19:18] x0 Normal operation
TUN[22:21] x0 Normal port operation
RBOX_N_ERR x0000 0000 0000 0064 RBOX North Port Error Status
SYN[8:2] x19 North IP Port Data Bit 21
RBOX_S_ERR x0000 0000 0000 00D0 RBOX South Port Error Status
SYN[8:2] x34 South IP Port Data Bit 15
RBOX_E_ERR x0000 0000 0000 0000 RBOX East Port Error Status
SYN[8:2] x0
RBOX_W_ERR x0000 0000 0000 0000 RBOX West Port Error Status
SYN[8:2] x0
RBOX_IO_CFG x0000 0000 0000 C095
OE[0:0] x1 Port is enabled and driving output data
IE[2] x1 Port is enabled and is interpreting input port bits
HAE[3] x0 Receive HW ALERT is disabled
CLC[4] x1 Fclk checking is enabled
SPD[8:5] x4 Speed ratio is 3 GClk Cycles Per Data Beat
ECC[14] x1 ECC checking/correction is enabled
KCL[15] x1 OE bit remains set on an IO port "black-hole"
FEM[17:16] x0 Normal operation
TUN[20:19] x0 Normal port operation
RBOX_IO_ERR x0000 0000 0000 0014
SYN[8:2] x5 IO Port - Syndrome not Valid
RBOX_L_ERR x0000 0000 0000 0000
RBOX_WHOAMI x0000 0000 0000 0002
Physical_CPU_Loc[7:0]x2 If 8P Configuration - Cabinet 0 - Drawer 1 - Module 0 - CPU 0
RBOX_IMASK x0000 0000 3F6D D61F
CCRDL[0] x1 CBOX correctable lock-step interrupts are enabled
CCRDN[1] x1 CBOX correctable (no-ock-step) interrupts are enabled
Z0CRD[2] x1 ZBOX0 correctable interrupts are enabled
Z1CRD[3] x1 ZBOX1 correctable interrupts are enabled
RCRD[4] x1 RBOX correctable interrupts are enabled
CPERF[5] x0 CBOX performance counter interrupts are disabled
Z0PERF[6] x0 ZBOX0 performance counter interrupts are disabled
Z1PERF[7] x0 ZBOX1 performance counter interrupts are disabled
RPERF[8] x0 RBOX performance counter interrupts are disabled
GIOL[9] x1 GIO interrupts are enabled
IOACRD[10] x1 I/O ASIC correctable/SW interrupts are enabled
INTQ[12] x1 Interrupt Queue interrupts are enabled
INTT[15] x1 Interval Timer interrupts are enabled
INTTO[16] x1 Interrupt Queue retry timeout/SW interrupts are enabled
IOHP[17] x0 Hot-Plug IO Event/SW interrupts are disabled
SWAL[18] x1 SW ALERT interrupts are enabled
GIOH[21] x1 GIO high priority interrupts are enabled
HALT[22] x1 HALT/SW interrupts are enabled
HWAL[24] x1 HW ALERT interrupts are enabled
CUCE[25] x1 CBOX uncorrectable interrupts are enabled
Z0UCE[26] x1 ZBOX0 uncorrectable interrupts are enabled
Z1UCE[27] x1 ZBOX1 uncorrectable interrupts are enabled
RUCE[28] x1 RBOX uncorrectable interrupts are enabled
IOAUCE[29] x1 I/O ASIC Error/SW interrupts are enabled
OCLA0[30] x0 OCLA0 interrupts are disabled
OCLA1[31] x0 OCLA1 interrupts are disabled
RBOX_INTQ x0000 0000 0000 8004
IQE[23:0] x8004 Head of the interrupt queue
VAL[24] x0 Queue entry is NOT VALID
RBOX_INT x0000 0000 0880 8108 RBOX Interrupt Status
Z1CRD[3] x1 * A ZBOX1 CORRECTABLE ERROR INTERRUPT HAS OCCURRED
RPERF[8] x1 An RBOX performance counter interrupt has occurred
INTT[15] x1 An Interval Timer interrupt has occurred
Z1UCE[27] x1 * A ZBOX1 UNCORRECTABLE ERROR INTERRUPT HAS OCCURRED
IO Subpacket
IO_ASIC_REV x0000 0000 0000 0012
ASIC_REV[3:0] x2 Pass 3
ASIC_Type[7:4] x1 ASIC Type, I07
IO_SYS_REV xFFFF FFFF FFFF 8F12
BP_Rev[3:0] x2 IO Backplane Revision
Platform_Backplane_ID[7:4]x1 IO Configurated in a 2P Drawer
BP_Num[11:8] xF IO Drawer 15 - Drawer 0 if 2P
Hose[14:12] x0 IO Hose 0 or X-Shelf 0
IO7_UPH x0000 0004 1042 4402
UPH_PID[10:0] x402 Logical IO7 # 2
UPH_CD_REQ[15:11] x8 Up-Hose Credit Request
UPH_CD_RIO[20:16] x2 Up-Hose Credit Read IO
UPH_CD_WIO[25:21] x2 Up-Hose Credit Write IO
UPH_CD_BLK[30:26] x4 Up-Hose Credit Block Request
UPH_CD_NBK[35:31] x8 Up-Hose Credit Non-Block Request
UPH_FR_CNT[36] x0 Force Errors as one-shot
UPH_FR_HDR[37] x0 Force Error on Data Flit
UPH_FR_SBE[38] x0 Force no action
UPH_FR_DBE[39] x0 Force no action
UPH_FR_GBG[40] x0 Force no action
UPH_ARB_MODE[42:41] x0 AGP Arb Mode. Port 7,0,1,2,3
HPI_CTL x0000 0000 0000 0011
HP_Offset[4:0] x11 Standard Interrupt for IO Hot-Plug
CPU_Target[22:14] x0 CPU 0
x0 Target CPU for Hot-Plug Interrupt
Hot_Plug_Int_Ena[24]x0 Hot-Plug Interrupts Disabled
CRD_CTL x0000 0000 0100 800A
CRD_Offset[4:0] xA Standard Interrupt for IO Correctable Interrupts
CPU_Target[22:14] x2 CPU 2
x2 Target CPU for IO Correctable Interrupts
CRD_Int_Ena[24] x1 Correctable Interrupts Enabled
HEI_CTL x0000 0000 0100 801D
HEI_Offset[4:0] x1D Standard Interrupt for IO Uncorrectable Interrupts
CPU_Target[22:14] x2 CPU 2
x2 Target CPU for Uncorrectable Interrupts
Hard_Int_Ena[24] x1 Uncorrectable Interrupts Enabled
PO7_ERROR_SUM x0000 0000 0000 0000
ERR_Valid[63] x0 Error Register was NOT Locked
PO7_UNCRR_SYM x0000 0000 0000 0000
ERR_Cycle[15:9] x0 No Cycle Decode Error Occurred
CLK_Symptoms[23:16] x0 No Clock Errors Detected
LSI_STRV[57:48] x0 LSI Interrupt,Bus,Slot and Intx
PO7_CRRCT_SYM x0000 0000 0000 0000
SYN[6:0] x0
ERR_CYC[15:9] x0 No Cycle Decode Error Occurred
PO7_UGBGE_SYM x0000 0000 0000 0000
UPH_PKT_Offset[37:6]x0 Address and Stripe Bit
UPH_PKT_Source_Port[51:48]x0 UpHose Pkt Source - South Port0
UPH_PKT_Dest_ID[62:52]x0 UpHose Destination PID
UPH_GBG_Sent_Valid[63]x0 UpHose Garbage Register NOT Valid
PO7_ERR_PKT0 x03FE 1810 8A04 0806
Bits[4:0] x6 Down-Hose Address Bits <10:6>
Op_Code[12:5] x40 Read Quardword
Bits[44:34] x604 Down-Hose Address Bits <21:11>
Bits[63:48] x3FE Down-Hose Address Bits <37:22>
PO7_ERR_PKT1 x0000 1400 0000 1400
PO0_ERR_SUM x0000 0000 0000 0000
UPE_ERROR[58:52] x0 No Up-Hose Engine Error Detected
ERR_Valid[63] x0 Error Register was NOT Locked
PO0_TLB_ERR x0000 0000 0000 0000
ERR_CODE[1:0] x0 No Error Condition Occurred
NOT_AGP[2] x0 AGP Transaction
ERR_TLB_PTR[5:3] x0 Index to Failing TLB Entry
FADDR[47:6] x0 Physical Memory Addr of TLB Entry Fetched
TLB_ERR_VALID[63] x0 TLB Register was NOT Locked
PO0_SPL_COMPLT x0000 0000 0000 0000
SPL_COMPLT[31:0] x0 Split Completion Message
SPL_COMPLT_FUNC[34:32]x0 Function Number
SPL_COMPLT_DEV[39:35]x0 Device Number
SPL_COMPLT_BUS[47:40]x0 Bus Number
PO0_TRANS_SUM x0000 0000 0000 0000
PCI_ADDR[49:0] x0
PCIX_Master_SLOT[55:52]x0 * IO7 was Master if Register Locked
PCIX_CMD[59:56] x0 Interrupt Acknowledge
ERR_VALID[63] x0 Error Register was NOT Locked
PO0_FIRST_ERR x0000 0000 0000 0000
UPE_ERROR[58:52] x0 No UPE Error Detected
ERR_Valid[63] x0 Error Register was NOT Locked
PO0_DM_SOURCE x0000 0000 0000 0000
PO0_DM_DEST x0000 0000 0000 0000
PO0_DM_SIZE x0000 0000 0000 0000
PO0_DM_CTRL x0000 0000 0000 0002
PO1_ERR_SUM x0000 0000 0000 0000
UPE_ERROR[58:52] x0 No Up-Hose Engine Error Detected
ERR_Valid[63] x0 Error Register was NOT Locked
PO1_TLB_ERR x0000 0000 0000 0000
ERR_CODE[1:0] x0 No Error Condition Occurred
NOT_AGP[2] x0 AGP Transaction
ERR_TLB_PTR[5:3] x0 Index to Failing TLB Entry
FADDR[47:6] x0 Physical Memory Addr of TLB Entry Fetched
TLB_ERR_VALID[63] x0 TLB Register was NOT Locked
PO1_SPL_COMPLT x0000 0000 0000 0000
SPL_COMPLT[31:0] x0 Split Completion Message
SPL_COMPLT_FUNC[34:32]x0 Function Number
SPL_COMPLT_DEV[39:35]x0 Device Number
SPL_COMPLT_BUS[47:40]x0 Bus Number
PO1_TRANS_SUM x0000 0000 0000 0000
PCI_ADDR[49:0] x0
PCIX_Master_SLOT[55:52]x0 * IO7 was Master if Register Locked
PCIX_CMD[59:56] x0 Interrupt Acknowledge
ERR_VALID[63] x0 Error Register was NOT Locked
PO1_FIRST_ERR x0000 0000 0000 0000
UPE_ERROR[58:52] x0 No UPE Error Detected
ERR_Valid[63] x0 Error Register was NOT Locked
PO1_DM_SOURCE x0000 0000 0000 0000
PO1_DM_DEST x0000 0000 0000 0000
PO1_DM_SIZE x0000 0000 0000 0000
PO1_DM_CTRL x0000 0000 0000 0002
PO1_HP_MISC x0000 0000
Hot_Plug_Intr_Switch[24]x0
Hot_Plug_Pwr_Fault[25]x0
Hot_Plug_Prsnt_Change[26]x0
Hot_Plug_Switch_Transition[27]x0
Register_Valid[31] x0 Register VALID
PO1_HP_EVNT x0000 0000
Register_Valid[31] x0 Register VALID
PO2_ERR_SUM x0000 0000 0000 0000
UPE_ERROR[58:52] x0 No Up-Hose Engine Error Detected
ERR_Valid[63] x0 Error Register was NOT Locked
PO2_TLB_ERR x0000 0000 0000 0000
ERR_CODE[1:0] x0 No Error Condition Occurred
NOT_AGP[2] x0 AGP Transaction
ERR_TLB_PTR[5:3] x0 Index to Failing TLB Entry
FADDR[47:6] x0 Physical Memory Addr of TLB Entry Fetched
TLB_ERR_VALID[63] x0 TLB Register was NOT Locked
PO2_SPL_COMPLT x0000 0000 0000 0000
SPL_COMPLT[31:0] x0 Split Completion Message
SPL_COMPLT_FUNC[34:32]x0 Function Number
SPL_COMPLT_DEV[39:35]x0 Device Number
SPL_COMPLT_BUS[47:40]x0 Bus Number
PO2_TRANS_SUM x0000 0000 0000 0000
PCI_ADDR[49:0] x0
PCIX_Master_SLOT[55:52]x0 * IO7 was Master if Register Locked
PCIX_CMD[59:56] x0 Interrupt Acknowledge
ERR_VALID[63] x0 Error Register was NOT Locked
PO2_FIRST_ERR x0000 0000 0000 0000
UPE_ERROR[58:52] x0 No UPE Error Detected
ERR_Valid[63] x0 Error Register was NOT Locked
PO2_DM_SOURCE x0000 0000 0000 0000
PO2_DM_DEST x0000 0000 0000 0000
PO2_DM_SIZE x0000 0000 0000 0000
PO2_DM_CTRL x0000 0000 0000 0002
PO2_HP_MISC x8000 0000
Hot_Plug_Intr_Switch[24]x0
Hot_Plug_Pwr_Fault[25]x0
Hot_Plug_Prsnt_Change[26]x0
Hot_Plug_Switch_Transition[27]x0
Register_Valid[31] x1 Register NOT Valid
PO2_HP_EVNT x8000 0000
Register_Valid[31] x1 Register NOT Valid
PO3_ERR_SUM x0000 0000 0000 0000
UPE_ERROR[58:52] x0 No Up-Hose Engine Error Detected
ERR_Valid[63] x0 Error Register was NOT Locked
PO3_TLB_ERR x0000 0000 0000 0000
ERR_CODE[1:0] x0 No Error Condition Occurred
NOT_AGP[2] x0 AGP Transaction
ERR_TLB_PTR[5:3] x0 Index to Failing TLB Entry
FADDR[47:6] x0 Physical Memory Addr of TLB Entry Fetched
TLB_ERR_VALID[63] x0 TLB Register was NOT Locked
PO3_SPL_COMPLT x0000 0000 0000 0000
SPL_COMPLT[31:0] x0 Split Completion Message
SPL_COMPLT_FUNC[34:32]x0 Function Number
SPL_COMPLT_DEV[39:35]x0 Device Number
SPL_COMPLT_BUS[47:40]x0 Bus Number
PO3_TRANS_SUM x0000 0000 0000 0000
PCI_ADDR[49:0] x0
PCIX_Master_SLOT[55:52]x0 * IO7 was Master if Register Locked
PCIX_CMD[59:56] x0 Interrupt Acknowledge
ERR_VALID[63] x0 Error Register was NOT Locked
PO3_FIRST_ERR x0000 0000 0000 0000
UPE_ERROR[58:52] x0 No UPE Error Detected
ERR_Valid[63] x0 Error Register was NOT Locked
PO3_DM_SOURCE x0000 0000 0000 0000
PO3_DM_DEST x0000 0000 0000 0000
PO3_DM_SIZE x0000 0000 0000 0000
PO3_DM_CTRL x0000 0000 0000 0002
Event: 8
Description: VMS Crash Restart Event at Mon 6 Aug 2018 10:35:36 GMT+00:00 from THMG03
File: ./clue$errlog.sys
================================================================================
COMMON EVENT HEADER (CEH) V2.0
Event_Leader xFFFF FFFE
Header_Length 308
Event_Length 824
Header_Rev_Major 2
Header_Rev_Minor 0
OS_Type 2 -- OpenVMS
Hardware_Arch 4 -- Alpha
CEH_Vendor_ID 3,564 -- Hewlett-Packard Company
Hdwr_Sys_Type 39 -- hp AlphaServer EV7 Series
Logging_CPU 0 -- CPU Logging this Event
CPUs_In_Active_Set 0
Major_Class 37
Minor_Class 0
Entry_Type 37 -- VMS Crash Restart Event
DSR_Msg_Num 2,030 -- AlphaServer ES47
Chip_Type 15 -- EV7 - 21364
CEH_Device 0
CEH_Device_ID_0 x0000 0000
CEH_Device_ID_1 x0000 0000
CEH_Device_ID_2 x0000 0000
Unique_ID_Count 27,301
Unique_ID_Prefix 0
Num_Strings 7
TLV Section of CEH
TLV_Sys_Serial_Num IE84200129
TLV_DSR_String hp AlphaServer ES47 7/1150
TLV_Time_as_Local Mon 6 Aug 2018 10:35:36 GMT+00:00
TLV_OS_Version V7.3-2
TLV_Computer_Name THMG03
Entry_Type 37
OpenVMS Crash Restart Event Data
KSP x0000 0000 7FF8 7EE0
ESP x0000 0000 7FF8 C000
SSP x0000 0000 7FF9 CD00
USP x0000 0000 7AE7 B850
R0 x0000 0000 0000 0001
R1 xFFFF FFFF 86BF C000
R2 x0000 0000 0000 0210
R3 x0000 0000 0000 0001
R4 x0000 0000 0000 0000
R5 x0000 0008 0000 2000
R6 x0000 0000 0000 001A
R7 x0000 0000 0009 79C5
R8 x0000 0000 0000 0000
R9 x0000 0000 0007 C100
R10 x0000 0000 0008 C230
R11 x0000 0000 0008 C110
R12 x0000 0000 0005 006C
R13 x0000 0000 0002 0000
R14 x0000 0000 7C09 ED5C
R15 x0000 0000 0000 0001
R16 x0000 0000 0000 0215
R17 x0000 0000 0000 0000
R18 x0000 0000 0000 0210
R19 x0000 0000 0000 0006
R20 x0000 0000 0000 0040
R21 x0000 0000 0000 0000
R22 x0000 0000 0000 0000
R23 x0000 0000 0000 0000
R24 xFFFF FFFF 86BF C000
R25 x0000 0000 0000 0001
R26 xFFFF FFFF 8001 4FA4
R27 xFFFF FFFF 869D 1AC0
R28 xFFFF FFFF 8006 AA90
FP x0000 0000 7FF8 7EE0
SP x0000 0000 7FF8 7EE0
PC xFFFF FFFF 8001 4FD0
PSL x2000 0000 0000 1F04
PTBR x0000 0000 0003 5EB4
PCBB x0000 0000 6BD6 A080
PRBR xFFFF FFFF 811C 7400
VPTB xFFFF FEFA 0000 0000
SCBB x0000 0000 0000 0F20
SISR x0000 0000 0000 0000
ASN x0000 0000 0000 007B
ASTSR_ASTEN x0000 0000 0000 000F
FEN x0000 0000 0000 0001
IPL x0000 0000 0000 001F
MCES x0000 0000 0000 0000
Crash_Code x0000 0215
Process_ID x0002 00AE
Process_Name_Length 14
Process_Name RMI_grib_srvr3
Crash_Code_String MACHINECHK, Machine check while in kernel mode
It looks like the zbox memory controller throws the errors which would seem like rimm models being and issue.
Any thoughts there? (Apart from some resocketing of the parts)
More information about the Info-vax
mailing list