[Info-vax] Alphaserver ES47: Suspected broken CPU, unable to stop/cpu 2

Robin Schrievers robin.schrievers at meteogroup.com
Thu Aug 9 04:01:24 EDT 2018


meanwhile, Roy has been of great help doing a analysis of the errlog.sys file of the latest crash. This is the result:

Event:       2
Description: UnCorrectable System Event  at Mon 6 Aug 2018 10:35:36 GMT+00:00 from THMG03  
File:        ./clue$errlog.sys
================================================================================

COMMON EVENT HEADER (CEH) V2.0
Event_Leader           xFFFF FFFE              
Header_Length           308                    
Event_Length            1,416                  
Header_Rev_Major        2                      
Header_Rev_Minor        0                      
OS_Type                 2                      -- OpenVMS
Hardware_Arch           4                      -- Alpha
CEH_Vendor_ID           3,564                  -- Hewlett-Packard Company
Hdwr_Sys_Type           39                     -- hp AlphaServer EV7 Series
Logging_CPU             2                      -- CPU Logging this Event
CPUs_In_Active_Set      0                      
Major_Class             27                     
Minor_Class             0                      
Entry_Type              660                    -- UnCorrectable System Event 
DSR_Msg_Num             2,030                  -- AlphaServer ES47
Chip_Type               15                     -- EV7 - 21364
CEH_Device              0                      
CEH_Device_ID_0        x0000 0000              
CEH_Device_ID_1        x0000 0000              
CEH_Device_ID_2        x0000 0000              
Unique_ID_Count         27,300                 
Unique_ID_Prefix        0                      
Num_Strings             7                      

TLV Section of CEH
TLV_Sys_Serial_Num     IE84200129              
TLV_DSR_String         hp AlphaServer ES47 7/1150
TLV_Time_as_Local      Mon 6 Aug 2018 10:35:36 GMT+00:00
TLV_OS_Version         V7.3-2                  
TLV_Computer_Name      THMG03                  
Entry_Type              660                    

SysType39_Processing

START OF SUBPACKETS IN THIS EVENT

Logout Header Frame
Frame_ID_Flags         x0000 0011              
   System_Type_ID[15:0]x11                     GS1280 Series - 2 P Drawer
   Recoverable_Flag[28]x0                      
   Second_Error_Flag[30]x0                      
   Retry_Flag[31]      x0                      
CPU_Offset             x0000 0060              
   CPU_Offset[31:0]    x60                     
System_Offset          x0000 0268              
   System_Offset[31:0] x268                    

PAL Specific Subpacket
Machine_Check_code     x0000 0202              
   Mchk_Check_Code[31:0]x202                    * 660 - SYSTEM DETECTED UNCORRECTABLE ERROR
Subpacket_Count        x0000 0004              
   Subpkt_Count[31:0]  x4                      
Processor_WHAMI        x0000 0000 0000 0002    
   CPU_WHAMI[7:0]      x2                      CPU 2
RBOX_WHOAMI            x0000 0000 0000 0002    
   Physical_CPU_Loc[7:0]x2                      If 8P Configuration - Cabinet 0 - Drawer 1 - Module 0 - CPU 0
RBOX_INT               x0000 0000 0880 0108    RBOX Interrupt Status
   Z1CRD[3]            x1                      * A ZBOX1 CORRECTABLE ERROR INTERRUPT HAS OCCURRED
   RPERF[8]            x1                      An RBOX performance counter interrupt has occurred
   Z1UCE[27]           x1                      * A ZBOX1 UNCORRECTABLE ERROR INTERRUPT HAS OCCURRED
Exc_Addr               x0000 0000 0003 D7F0    Exception Address Register
   EXC_ADDR[63:0]      x3 D7F0                 Exception Address
Time_Stamp             x0000 1208 060A 120C    
   Second[7:0]         xC                      
   Minute[15:8]        x12                     
   Hour[23:16]         xA                      
   Day[31:24]          x6                      
   Month[39:32]        x8                      
   Year[47:40]         x12                     
Halt_Code_Reason       x0000 0000 0000 1000    
   Halt_Reason_Code[15:0]x1000                   * MACHINE CHECK CALL IN OPERATING SYSTEM

Processor Subpacket
I_STAT                 x0000 0025 0000 0000    I BOX Status Register
   OVR[32:30]          x4                      ProfileMe Counter 0 Overcount
   ICM[33]             x0                      ProfileMe Icache Miss
   TRAP_TYPE[37:34]    x9                      See PMPC <14:0>
   LS0[38]             x0                      ProfileMe Load - Mbox Load-Store Order Replay Trap
   TRP[39]             x0                      ProfileMe Trap - Check Trap Type [3:0]
   MSI[40]             x0                      ProfileMe Mispredict Trap
DC_STAT                x0000 0000 0000 0000    Dcache Status Register
C_Addr                 x0000 07E8 FFB0 2000    Cbox Address
   ERR_ADR[42:0]       x7E8 FFB0 2000          Error Address
C_Syndrome_1           x0000 0000 0000 0000    Syndrome for Upper Quadword
   C_Syndrome_1[8:0]   x0                      
C_Syndrome_0           x0000 0000 0000 0000    Syndrome for Lower Quadword
C_STAT                 x0000 0000 0000 0000    Cbox Status
   C_STAT[4:0]         x0                      
C_STS                  x0000 0000 0000 0000    Cbox Block Status
   C_STS[3:0]          x0                      Shared
MM_STAT                x0000 0000 0000 00B0    Memory Management Status
   OPCODE[9:4]         xB                      OpCode that caused the ERROR
Exc_Addr               x0000 0000 0003 D7F0    Exception Address Register
   EXC_ADDR[63:0]      x3 D7F0                 Exception Address
IER_CM                 x0000 007E FFFF E018    Interrupt Enable and Current Mode
   CM[4:3]             x3                      User Mode
   ASTEN[13]           x1                      AST Interrupt Enable
   SIEN[28:14]         x7FFF                   Software Interrupt Enables
   PCEN[30:29]         x3                      Performance Counter Interrupt Enable
   CREN[31]            x1                      Correctable Read Error Interrupt Enable
   EIEN[38:33]         x3F                     External Interrupt Enables
ISUM                   x0000 0042 0000 0000    Interrupt Summary Register
   EI[38:33]           x21                     External Interrupts
PAL_BASE               x0000 0008 0003 0000    PAL Base Register
   PAL_BASE[43:15]     x10 0006                Base Physical Address for PAL Code
I_CTL                  xFFFF FEFA 0430 0386    Ibox Control
   SPCE[0]             x0                      System Performance Counting Enable
   IC_EN[2:1]          x3                      Icache Set Enable
   SPE[5:3]            x0                      Super Page Mode Enable
   RPM[6]              x0                      Reduced Page Mode
   SDE[7]              x1                      Access to PAL Shadow Registers Enabled
   SBE[9:8]            x3                      Stream Buffer Enable
   BP_MODE[11:10]      x0                      Branch Prediction Mode
   ST_WAIT_64K[20]     x1                      stWait Table cleared after 64K Cycles
   MCHK_EN[21]         x1                      Machine Checks are Enabled
   BIST_FAIL[23]       x0                      BIST has run Successfully
   CHIP_ID[29:24]      x4                      Revision ID of CPU Chip
   VPTB[47:30]         x3 FBE8                 Virtual Page Table Base
   SEXT_VPTB_47[63:48] xFFFF                   Sign Extended VPTB <47>
Process_Context        x0000 3D80 0000 01E4    Process Context
   PPCE[1]             x0                      Process Performance Counting Enable
   FPE[2]              x1                      Floating Point Enable
   ASTER[8:5]          xF                      AST Enable Register
   ASTRR[12:9]         x0                      AST Request Register
   ASN[46:39]          x7B                     Address Space Number
CBOX_CTL               x1600 0000 1004 A802    Cbox Control
   PID[7:0]            x2                      Processor ID
   PAGE_MIGR_FAST[9]   x0                      16 events between migration samples
   CACHE_ISTM[13]      x1                      Cache istream fills in bcache
   ENA_ECC[15]         x1                      ECC checking is enabled
   ACC_CLUMP[17]       x0                      Local/Global access checks are for a single processor
   LPACC[18]           x1                      Bypass local memory access checks
   PRBQ_STXC_DIS[28]   x1                      PRBQ treats StoD_STxC's as StoD's
   OCLA_ENA[63]        x0                      OCLA is disabled (copy of QBOX bit)
CBOX_STP_CTL           x0000 0000 0000 0000    Cbox Stripe Control
   STP[63:0]           x0                      
CBOX_ACC_CTL           x0000 0000 0000 0000    Cbox Access Control
   ACC[63:0]           x0                      
CBOX_LCL_SET           xFFFF FFFF FFFF FFFF    Cbox Local Processor Set
   LCL[63:0]           xFFFF FFFF FFFF FFFF    
CBOX_GBL_SET           x0000 0000 0000 0000    Cbox  Global Processor Set
   GBL[63:0]           x0                      
BBOX_CTL               x0000 0000 0001 C67F    Bbox Control
   SET_ENA[6:0]        x7F                     L2 Cache Set Enable
   EVICT_NEXT[10:8]    x6                      Evict Next Set
   BC_STS_PAR_ENA[14]  x1                      L2 Cache sts Parity Check Enable
   BC_TAG_PAR_ENA[15]  x1                      L2 Cache Tag Parity Check Enable
   TTAG_PARITY_ENA[16] x1                      Ttag Parity Check Enable
BBOX_ERR_STS           x0000 0000 0000 0000    Bbox Error Status
   BSTS_PAR[6:0]       x0                      
   BTAG_PAR[14:8]      x0                      
   TTAG_PAR[23:16]     x0                      
   BBOX_ERR_MASK[26:24]x0                      No Error Occurred
BBOX_ERR_IDX           x0000 0000 0003 FBC0    Bbox Error Index
   IDX[17:6]           xFEF                    L2 Cache locked / index of most recent Parity Error
CBOX_DDP_ERR_STS       x0000 0000 0000 0000    Cbox Data Path Error Status
   CBOX_ERR_MASK[2:0]  x0                      No Error Occurred
   ERR_SET[5:3]        x0                      L2 Cache Set with Error
   ERR_IDX[17:6]       x0                      L2 Cache Set Index with Error
   ERR_SYN[28:20]      x0                      
   SIDE[29]            x0                      DP0 or DP1
BBOX_DAT_RMP           x0000 0000 0000 0000    Bbox Data Remap Information
   NE_REMAP[6:0]       x0                      
   NW_REMAP[14:8]      x0                      
   SW_REMAP[22:16]     x0                      
   SE_REMAP[30:24]     x0                      
   BTAG_BAD[31]        x0                      Btag BIST Passed

ZBOX Subpacket
ZBOX0_DRAM_ERR_Status1 x0000 0000              
   DIR_ERRSYN[10:5]    x0                      
   DAT_ERRSYN1[22:14]  x0                      
   DAT_ERRSYN0[31:23]  x0                      
ZBOX0_DRAM_ERR_Status2 x0000 000B              
   TEMPCAL_CHAN[4:0]   xB                      
   TEMPCAL_DEV[9:5]    x0                      
   DAT_ERRSYN3[22:14]  x0                      
   DAT_ERRSYN2[31:23]  x0                      
ZBOX0_DRAM_ERR_Status3 x0000 0000              
ZBOX0_DRAM_ERROR_CTL   x2082 133F              
   RAID_ON[24]         x0                      RAID channel is not in use
ZBOX0_DRAM_ERR_ADR     x0000 0000              
   ERR_ADDR[28:0]      x0                      
ZBOX0_DIFT_Timeout     x8400 0000              
ZBOX0_DRAM_MAPPER_CTL  x2E21 CC8B              
ZBOX0_FRC_ERR_ADR      x0000 0001              
ZBOX0_DIFT_ERR_Status  x0000 0000              
ZBOX1_DRAM_ERR_Status1 x0000 0000              
   DIR_ERRSYN[10:5]    x0                      
   DAT_ERRSYN1[22:14]  x0                      
   DAT_ERRSYN0[31:23]  x0                      
ZBOX1_DRAM_ERR_Status2 x0800 0003              
   TEMPCAL_CHAN[4:0]   x3                      
   TEMPCAL_DEV[9:5]    x0                      
   DAT_ERRSYN3[22:14]  x0                      
   DAT_ERRSYN2[31:23]  x10                     OctaWord2 ECC Check Bit 4
ZBOX1_DRAM_ERR_Status3 x0000 2002              
   SGL[1]              x1                      * ONE OR TWO SINGLE BIT ECC ERRORS WERE DETECTED ON A MEMORY READ
   MEO[13]             x1                      * A SECOND CORRECTABLE ERROR OCCURRED FOR WHICH NO PHYSICAL ADDRESS WAS LATCHED
ZBOX1_DRAM_ERROR_CTL   x2082 133F              
   RAID_ON[24]         x0                      RAID channel is not in use
ZBOX1_DRAM_ERR_ADR     x0000 0000              
   ERR_ADDR[28:0]      x0                      
ZBOX1_DIFT_Timeout     x8400 0000              
ZBOX1_DRAM_MAPPER_CTL  x2E21 CC8B              
ZBOX1_FRC_ERR_ADR      x0000 0000              
ZBOX1_DIFT_ERR_Status  x0000 0000              
CBOX_CTL               x1600 0000 1004 A802    Cbox Control
   PID[7:0]            x2                      Processor ID
   PAGE_MIGR_FAST[9]   x0                      16 events between migration samples
   CACHE_ISTM[13]      x1                      Cache istream fills in bcache
   ENA_ECC[15]         x1                      ECC checking is enabled
   ACC_CLUMP[17]       x0                      Local/Global access checks are for a single processor
   LPACC[18]           x1                      Bypass local memory access checks
   PRBQ_STXC_DIS[28]   x1                      PRBQ treats StoD_STxC's as StoD's
   OCLA_ENA[63]        x0                      OCLA is disabled (copy of QBOX bit)
CBOX_STP_CTL           x0000 0000 0000 0000    Cbox Stripe Control
   STP[63:0]           x0                      
ZBOX0_ERROR_PA         x0000 0008 0000 0000    
ZBOX1_ERROR_PA         x0000 0008 0000 0000    
ZBOX0_ORED_SYNDROME    x0000 0000 0000 0000    
ZBOX1_ORED_SYNDROME    x0000 0000 0000 0000    

RBOX Subpacket
RBOX_CFG               x0000 0000 C976 C6C1    
   SPD[3:0]            x1                      Pin Speed Ratio is 1.5 GClk Cycles Per Data Beat
   PRI[4]              x0                      East/West is the primary axis
   ADA[6]              x1                      Enable adaptive routing
   ADB[7]              x1                      Enable adaptive buffers
   BRO[12:9]           x3                      Broadcasts will be forwarded to the N & S IP ports
   SYF[16:14]          x3                      SYNCH interval = 16K, period = 4M
   RSE[17]             x1                      Target Reservations are enabled
   STI[20:18]          x5                      Starvation interval = 1023
   DRE[21]             x1                      Drain is enabled
   DRI[24:22]          x5                      Drain Interval = 1023
   TAS[26:25]          x0                      InvalAck Table A Select = N
   TBS[28:27]          x1                      InvalAck Table B Select = S
   TCS[30:29]          x2                      InvalAck Table C Select = E
   IRW[31]             x1                      Writes to the router table will be ignored
RBOX_N_CFG             x0000 0000 0000 3715    North IP port configuration
   OE[0]               x1                      Port is enabled and driving output data
   IE[2]               x1                      Port is enabled and is interpreting input port bits
   BRO[6:3]            x2                      Broadcasts will be forwarded to the S IP port
   BIL[8]              x1                      Broadcast Inval Limit = 16
   HAE[9]              x1                      Receive HW ALERT is enabled
   SAE[10]             x1                      Receive SW ALERT is enabled
   SYC[11]             x0                      Receive SYNCH is disabled
   CLC[12]             x1                      Fclk checking is enabled
   ECC[13]             x1                      ECC checking/correction is enabled
   UNI[15:14]          x0                      Unload pointer init value (for clock-forward reset)
   FEM[19:18]          x0                      Normal operation
   TUN[22:21]          x0                      Normal port operation
RBOX_S_CFG             x0000 0000 0000 360D    South IP port configuration
   OE[0]               x1                      Port is enabled and driving output data
   IE[2]               x1                      Port is enabled and is interpreting input port bits
   BRO[6:3]            x1                      Broadcasts will be forwarded to the N IP port
   BIL[8]              x0                      Broadcast Inval Limit = 8
   HAE[9]              x1                      Receive HW ALERT is enabled
   SAE[10]             x1                      Receive SW ALERT is enabled
   SYC[11]             x0                      Receive SYNCH is disabled
   CLC[12]             x1                      Fclk checking is enabled
   ECC[13]             x1                      ECC checking/correction is enabled
   UNI[15:14]          x0                      Unload pointer init value (for clock-forward reset)
   FEM[19:18]          x0                      Normal operation
   TUN[22:21]          x0                      Normal port operation
RBOX_E_CFG             x0000 0000 0000 0618    East IP port configuration
   OE[0]               x0                      Output port is disabled - messages destined to pass through this output port are "black-holed"
   IE[2]               x0                      Port is disabled or has Black Holed - input port bits are being ignored
   BRO[6:3]            x3                      Broadcasts will be forwarded to the S & N IP ports
   BIL[8]              x0                      Broadcast Inval Limit = 8
   HAE[9]              x1                      Receive HW ALERT is enabled
   SAE[10]             x1                      Receive SW ALERT is enabled
   SYC[11]             x0                      Receive SYNCH is disabled
   CLC[12]             x0                      * FCLK CHECKING IS DISABLED
   ECC[13]             x0                      * ECC CHECKING/CORRECTION IS DISABLED
   UNI[15:14]          x0                      Unload pointer init value (for clock-forward reset)
   FEM[19:18]          x0                      Normal operation
   TUN[22:21]          x0                      Normal port operation
RBOX_W_CFG             x0000 0000 0000 0618    West IP port configuration
   OE[0]               x0                      Output port is disabled - messages destined to pass through this output port are "black-holed"
   IE[2]               x0                      Port is disabled or has Black Holed - input port bits are being ignored
   BRO[6:3]            x3                      Broadcasts will be forwarded to the S & N IP ports
   BIL[8]              x0                      Broadcast Inval Limit = 8
   HAE[9]              x1                      Receive HW ALERT is enabled
   SAE[10]             x1                      Receive SW ALERT is enabled
   SYC[11]             x0                      Receive SYNCH is disabled
   CLC[12]             x0                      * FCLK CHECKING IS DISABLED
   ECC[13]             x0                      * ECC CHECKING/CORRECTION IS DISABLED
   UNI[15:14]          x0                      Unload pointer init value (for clock-forward reset)
   FEM[19:18]          x0                      Normal operation
   TUN[22:21]          x0                      Normal port operation
RBOX_N_ERR             x0000 0000 0000 0064    RBOX North Port Error Status
   SYN[8:2]            x19                     North IP Port Data Bit 21
RBOX_S_ERR             x0000 0000 0000 00D0    RBOX South Port Error Status
   SYN[8:2]            x34                     South IP Port Data Bit 15
RBOX_E_ERR             x0000 0000 0000 0000    RBOX East Port Error Status
   SYN[8:2]            x0                      
RBOX_W_ERR             x0000 0000 0000 0000    RBOX West Port Error Status
   SYN[8:2]            x0                      
RBOX_IO_CFG            x0000 0000 0000 C095    
   OE[0:0]             x1                      Port is enabled and driving output data
   IE[2]               x1                      Port is enabled and is interpreting input port bits
   HAE[3]              x0                      Receive HW ALERT is disabled
   CLC[4]              x1                      Fclk checking is enabled
   SPD[8:5]            x4                      Speed ratio is 3 GClk Cycles Per Data Beat
   ECC[14]             x1                      ECC checking/correction is enabled
   KCL[15]             x1                      OE bit remains set on an IO port "black-hole"
   FEM[17:16]          x0                      Normal operation
   TUN[20:19]          x0                      Normal port operation
RBOX_IO_ERR            x0000 0000 0000 0014    
   SYN[8:2]            x5                      IO Port - Syndrome not Valid
RBOX_L_ERR             x0000 0000 0000 0000    
RBOX_WHOAMI            x0000 0000 0000 0002    
   Physical_CPU_Loc[7:0]x2                      If 8P Configuration - Cabinet 0 - Drawer 1 - Module 0 - CPU 0
RBOX_IMASK             x0000 0000 3F6D D61F    
   CCRDL[0]            x1                      CBOX correctable lock-step interrupts are enabled
   CCRDN[1]            x1                      CBOX correctable (no-ock-step) interrupts are enabled
   Z0CRD[2]            x1                      ZBOX0 correctable interrupts are enabled
   Z1CRD[3]            x1                      ZBOX1 correctable interrupts are enabled
   RCRD[4]             x1                      RBOX correctable interrupts are enabled
   CPERF[5]            x0                      CBOX performance counter interrupts are disabled
   Z0PERF[6]           x0                      ZBOX0 performance counter interrupts are disabled
   Z1PERF[7]           x0                      ZBOX1 performance counter interrupts are disabled
   RPERF[8]            x0                      RBOX performance counter interrupts are disabled
   GIOL[9]             x1                      GIO interrupts are enabled
   IOACRD[10]          x1                      I/O ASIC correctable/SW interrupts are enabled
   INTQ[12]            x1                      Interrupt Queue interrupts are enabled
   INTT[15]            x1                      Interval Timer interrupts are enabled
   INTTO[16]           x1                      Interrupt Queue retry timeout/SW interrupts are enabled
   IOHP[17]            x0                      Hot-Plug IO Event/SW interrupts are disabled
   SWAL[18]            x1                      SW ALERT interrupts are enabled
   GIOH[21]            x1                      GIO high priority interrupts are enabled
   HALT[22]            x1                      HALT/SW interrupts are enabled
   HWAL[24]            x1                      HW ALERT interrupts are enabled
   CUCE[25]            x1                      CBOX uncorrectable interrupts are enabled
   Z0UCE[26]           x1                      ZBOX0 uncorrectable interrupts are enabled
   Z1UCE[27]           x1                      ZBOX1 uncorrectable interrupts are enabled
   RUCE[28]            x1                      RBOX uncorrectable interrupts are enabled
   IOAUCE[29]          x1                      I/O ASIC Error/SW interrupts are enabled
   OCLA0[30]           x0                      OCLA0 interrupts are disabled
   OCLA1[31]           x0                      OCLA1 interrupts are disabled
RBOX_INTQ              x0000 0000 0000 8004    
   IQE[23:0]           x8004                   Head of the interrupt queue
   VAL[24]             x0                      Queue entry is NOT VALID
RBOX_INT               x0000 0000 0880 8108    RBOX Interrupt Status
   Z1CRD[3]            x1                      * A ZBOX1 CORRECTABLE ERROR INTERRUPT HAS OCCURRED
   RPERF[8]            x1                      An RBOX performance counter interrupt has occurred
   INTT[15]            x1                      An Interval Timer interrupt has occurred
   Z1UCE[27]           x1                      * A ZBOX1 UNCORRECTABLE ERROR INTERRUPT HAS OCCURRED

IO Subpacket
IO_ASIC_REV            x0000 0000 0000 0012    
   ASIC_REV[3:0]       x2                      Pass 3
   ASIC_Type[7:4]      x1                      ASIC Type, I07
IO_SYS_REV             xFFFF FFFF FFFF 8F12    
   BP_Rev[3:0]         x2                      IO Backplane Revision
   Platform_Backplane_ID[7:4]x1                      IO Configurated in a 2P Drawer
   BP_Num[11:8]        xF                      IO Drawer 15 - Drawer 0 if 2P
   Hose[14:12]         x0                      IO Hose 0 or X-Shelf 0
IO7_UPH                x0000 0004 1042 4402    
   UPH_PID[10:0]       x402                    Logical IO7 # 2
   UPH_CD_REQ[15:11]   x8                      Up-Hose Credit Request
   UPH_CD_RIO[20:16]   x2                      Up-Hose Credit Read IO
   UPH_CD_WIO[25:21]   x2                      Up-Hose Credit Write IO
   UPH_CD_BLK[30:26]   x4                      Up-Hose Credit Block Request
   UPH_CD_NBK[35:31]   x8                      Up-Hose Credit Non-Block Request
   UPH_FR_CNT[36]      x0                      Force Errors as one-shot
   UPH_FR_HDR[37]      x0                      Force Error on Data Flit
   UPH_FR_SBE[38]      x0                      Force no action
   UPH_FR_DBE[39]      x0                      Force no action
   UPH_FR_GBG[40]      x0                      Force no action
   UPH_ARB_MODE[42:41] x0                      AGP Arb Mode. Port 7,0,1,2,3
HPI_CTL                x0000 0000 0000 0011    
   HP_Offset[4:0]      x11                     Standard Interrupt for IO Hot-Plug
   CPU_Target[22:14]   x0                      CPU 0
                       x0                      Target CPU for Hot-Plug Interrupt
   Hot_Plug_Int_Ena[24]x0                      Hot-Plug Interrupts Disabled
CRD_CTL                x0000 0000 0100 800A    
   CRD_Offset[4:0]     xA                      Standard Interrupt for IO Correctable Interrupts
   CPU_Target[22:14]   x2                      CPU 2
                       x2                      Target CPU for IO Correctable Interrupts
   CRD_Int_Ena[24]     x1                      Correctable Interrupts Enabled
HEI_CTL                x0000 0000 0100 801D    
   HEI_Offset[4:0]     x1D                     Standard Interrupt for IO Uncorrectable Interrupts
   CPU_Target[22:14]   x2                      CPU 2
                       x2                      Target CPU for Uncorrectable Interrupts
   Hard_Int_Ena[24]    x1                      Uncorrectable Interrupts Enabled
PO7_ERROR_SUM          x0000 0000 0000 0000    
   ERR_Valid[63]       x0                      Error Register was NOT Locked
PO7_UNCRR_SYM          x0000 0000 0000 0000    
   ERR_Cycle[15:9]     x0                      No Cycle Decode Error Occurred
   CLK_Symptoms[23:16] x0                      No Clock Errors Detected
   LSI_STRV[57:48]     x0                      LSI Interrupt,Bus,Slot and Intx
PO7_CRRCT_SYM          x0000 0000 0000 0000    
   SYN[6:0]            x0                      
   ERR_CYC[15:9]       x0                      No Cycle Decode Error Occurred
PO7_UGBGE_SYM          x0000 0000 0000 0000    
   UPH_PKT_Offset[37:6]x0                      Address and Stripe Bit
   UPH_PKT_Source_Port[51:48]x0                      UpHose Pkt Source - South Port0
   UPH_PKT_Dest_ID[62:52]x0                      UpHose Destination PID
   UPH_GBG_Sent_Valid[63]x0                      UpHose Garbage Register NOT Valid
PO7_ERR_PKT0           x03FE 1810 8A04 0806    
   Bits[4:0]           x6                      Down-Hose Address Bits <10:6>
   Op_Code[12:5]       x40                     Read Quardword
   Bits[44:34]         x604                    Down-Hose Address Bits <21:11>
   Bits[63:48]         x3FE                    Down-Hose Address Bits <37:22>
PO7_ERR_PKT1           x0000 1400 0000 1400    
PO0_ERR_SUM            x0000 0000 0000 0000    
   UPE_ERROR[58:52]    x0                      No Up-Hose Engine Error Detected
   ERR_Valid[63]       x0                      Error Register was NOT Locked
PO0_TLB_ERR            x0000 0000 0000 0000    
   ERR_CODE[1:0]       x0                      No Error Condition Occurred
   NOT_AGP[2]          x0                      AGP Transaction
   ERR_TLB_PTR[5:3]    x0                      Index to Failing TLB Entry
   FADDR[47:6]         x0                      Physical Memory Addr of TLB Entry Fetched
   TLB_ERR_VALID[63]   x0                      TLB Register was NOT Locked
PO0_SPL_COMPLT         x0000 0000 0000 0000    
   SPL_COMPLT[31:0]    x0                      Split Completion Message
   SPL_COMPLT_FUNC[34:32]x0                      Function Number
   SPL_COMPLT_DEV[39:35]x0                      Device Number
   SPL_COMPLT_BUS[47:40]x0                      Bus Number
PO0_TRANS_SUM          x0000 0000 0000 0000    
   PCI_ADDR[49:0]      x0                      
   PCIX_Master_SLOT[55:52]x0                      * IO7 was Master if Register Locked
   PCIX_CMD[59:56]     x0                      Interrupt Acknowledge
   ERR_VALID[63]       x0                      Error Register was NOT Locked
PO0_FIRST_ERR          x0000 0000 0000 0000    
   UPE_ERROR[58:52]    x0                      No UPE Error Detected
   ERR_Valid[63]       x0                      Error Register was NOT Locked
PO0_DM_SOURCE          x0000 0000 0000 0000    
PO0_DM_DEST            x0000 0000 0000 0000    
PO0_DM_SIZE            x0000 0000 0000 0000    
PO0_DM_CTRL            x0000 0000 0000 0002    
PO1_ERR_SUM            x0000 0000 0000 0000    
   UPE_ERROR[58:52]    x0                      No Up-Hose Engine Error Detected
   ERR_Valid[63]       x0                      Error Register was NOT Locked
PO1_TLB_ERR            x0000 0000 0000 0000    
   ERR_CODE[1:0]       x0                      No Error Condition Occurred
   NOT_AGP[2]          x0                      AGP Transaction
   ERR_TLB_PTR[5:3]    x0                      Index to Failing TLB Entry
   FADDR[47:6]         x0                      Physical Memory Addr of TLB Entry Fetched
   TLB_ERR_VALID[63]   x0                      TLB Register was NOT Locked
PO1_SPL_COMPLT         x0000 0000 0000 0000    
   SPL_COMPLT[31:0]    x0                      Split Completion Message
   SPL_COMPLT_FUNC[34:32]x0                      Function Number
   SPL_COMPLT_DEV[39:35]x0                      Device Number
   SPL_COMPLT_BUS[47:40]x0                      Bus Number
PO1_TRANS_SUM          x0000 0000 0000 0000    
   PCI_ADDR[49:0]      x0                      
   PCIX_Master_SLOT[55:52]x0                      * IO7 was Master if Register Locked
   PCIX_CMD[59:56]     x0                      Interrupt Acknowledge
   ERR_VALID[63]       x0                      Error Register was NOT Locked
PO1_FIRST_ERR          x0000 0000 0000 0000    
   UPE_ERROR[58:52]    x0                      No UPE Error Detected
   ERR_Valid[63]       x0                      Error Register was NOT Locked
PO1_DM_SOURCE          x0000 0000 0000 0000    
PO1_DM_DEST            x0000 0000 0000 0000    
PO1_DM_SIZE            x0000 0000 0000 0000    
PO1_DM_CTRL            x0000 0000 0000 0002    
PO1_HP_MISC            x0000 0000              
   Hot_Plug_Intr_Switch[24]x0                      
   Hot_Plug_Pwr_Fault[25]x0                      
   Hot_Plug_Prsnt_Change[26]x0                      
   Hot_Plug_Switch_Transition[27]x0                      
   Register_Valid[31]  x0                      Register VALID
PO1_HP_EVNT            x0000 0000              
   Register_Valid[31]  x0                      Register VALID
PO2_ERR_SUM            x0000 0000 0000 0000    
   UPE_ERROR[58:52]    x0                      No Up-Hose Engine Error Detected
   ERR_Valid[63]       x0                      Error Register was NOT Locked
PO2_TLB_ERR            x0000 0000 0000 0000    
   ERR_CODE[1:0]       x0                      No Error Condition Occurred
   NOT_AGP[2]          x0                      AGP Transaction
   ERR_TLB_PTR[5:3]    x0                      Index to Failing TLB Entry
   FADDR[47:6]         x0                      Physical Memory Addr of TLB Entry Fetched
   TLB_ERR_VALID[63]   x0                      TLB Register was NOT Locked
PO2_SPL_COMPLT         x0000 0000 0000 0000    
   SPL_COMPLT[31:0]    x0                      Split Completion Message
   SPL_COMPLT_FUNC[34:32]x0                      Function Number
   SPL_COMPLT_DEV[39:35]x0                      Device Number
   SPL_COMPLT_BUS[47:40]x0                      Bus Number
PO2_TRANS_SUM          x0000 0000 0000 0000    
   PCI_ADDR[49:0]      x0                      
   PCIX_Master_SLOT[55:52]x0                      * IO7 was Master if Register Locked
   PCIX_CMD[59:56]     x0                      Interrupt Acknowledge
   ERR_VALID[63]       x0                      Error Register was NOT Locked
PO2_FIRST_ERR          x0000 0000 0000 0000    
   UPE_ERROR[58:52]    x0                      No UPE Error Detected
   ERR_Valid[63]       x0                      Error Register was NOT Locked
PO2_DM_SOURCE          x0000 0000 0000 0000    
PO2_DM_DEST            x0000 0000 0000 0000    
PO2_DM_SIZE            x0000 0000 0000 0000    
PO2_DM_CTRL            x0000 0000 0000 0002    
PO2_HP_MISC            x8000 0000              
   Hot_Plug_Intr_Switch[24]x0                      
   Hot_Plug_Pwr_Fault[25]x0                      
   Hot_Plug_Prsnt_Change[26]x0                      
   Hot_Plug_Switch_Transition[27]x0                      
   Register_Valid[31]  x1                      Register NOT Valid
PO2_HP_EVNT            x8000 0000              
   Register_Valid[31]  x1                      Register NOT Valid
PO3_ERR_SUM            x0000 0000 0000 0000    
   UPE_ERROR[58:52]    x0                      No Up-Hose Engine Error Detected
   ERR_Valid[63]       x0                      Error Register was NOT Locked
PO3_TLB_ERR            x0000 0000 0000 0000    
   ERR_CODE[1:0]       x0                      No Error Condition Occurred
   NOT_AGP[2]          x0                      AGP Transaction
   ERR_TLB_PTR[5:3]    x0                      Index to Failing TLB Entry
   FADDR[47:6]         x0                      Physical Memory Addr of TLB Entry Fetched
   TLB_ERR_VALID[63]   x0                      TLB Register was NOT Locked
PO3_SPL_COMPLT         x0000 0000 0000 0000    
   SPL_COMPLT[31:0]    x0                      Split Completion Message
   SPL_COMPLT_FUNC[34:32]x0                      Function Number
   SPL_COMPLT_DEV[39:35]x0                      Device Number
   SPL_COMPLT_BUS[47:40]x0                      Bus Number
PO3_TRANS_SUM          x0000 0000 0000 0000    
   PCI_ADDR[49:0]      x0                      
   PCIX_Master_SLOT[55:52]x0                      * IO7 was Master if Register Locked
   PCIX_CMD[59:56]     x0                      Interrupt Acknowledge
   ERR_VALID[63]       x0                      Error Register was NOT Locked
PO3_FIRST_ERR          x0000 0000 0000 0000    
   UPE_ERROR[58:52]    x0                      No UPE Error Detected
   ERR_Valid[63]       x0                      Error Register was NOT Locked
PO3_DM_SOURCE          x0000 0000 0000 0000    
PO3_DM_DEST            x0000 0000 0000 0000    
PO3_DM_SIZE            x0000 0000 0000 0000    
PO3_DM_CTRL            x0000 0000 0000 0002    


Event:       8
Description: VMS Crash Restart Event  at Mon 6 Aug 2018 10:35:36 GMT+00:00 from THMG03  
File:        ./clue$errlog.sys
================================================================================

COMMON EVENT HEADER (CEH) V2.0
Event_Leader           xFFFF FFFE              
Header_Length           308                    
Event_Length            824                    
Header_Rev_Major        2                      
Header_Rev_Minor        0                      
OS_Type                 2                      -- OpenVMS
Hardware_Arch           4                      -- Alpha
CEH_Vendor_ID           3,564                  -- Hewlett-Packard Company
Hdwr_Sys_Type           39                     -- hp AlphaServer EV7 Series
Logging_CPU             0                      -- CPU Logging this Event
CPUs_In_Active_Set      0                      
Major_Class             37                     
Minor_Class             0                      
Entry_Type              37                     -- VMS Crash Restart Event 
DSR_Msg_Num             2,030                  -- AlphaServer ES47
Chip_Type               15                     -- EV7 - 21364
CEH_Device              0                      
CEH_Device_ID_0        x0000 0000              
CEH_Device_ID_1        x0000 0000              
CEH_Device_ID_2        x0000 0000              
Unique_ID_Count         27,301                 
Unique_ID_Prefix        0                      
Num_Strings             7                      

TLV Section of CEH
TLV_Sys_Serial_Num     IE84200129              
TLV_DSR_String         hp AlphaServer ES47 7/1150
TLV_Time_as_Local      Mon 6 Aug 2018 10:35:36 GMT+00:00
TLV_OS_Version         V7.3-2                  
TLV_Computer_Name      THMG03                  
Entry_Type              37                     

OpenVMS Crash Restart Event Data
KSP                    x0000 0000 7FF8 7EE0    
ESP                    x0000 0000 7FF8 C000    
SSP                    x0000 0000 7FF9 CD00    
USP                    x0000 0000 7AE7 B850    
R0                     x0000 0000 0000 0001    
R1                     xFFFF FFFF 86BF C000    
R2                     x0000 0000 0000 0210    
R3                     x0000 0000 0000 0001    
R4                     x0000 0000 0000 0000    
R5                     x0000 0008 0000 2000    
R6                     x0000 0000 0000 001A    
R7                     x0000 0000 0009 79C5    
R8                     x0000 0000 0000 0000    
R9                     x0000 0000 0007 C100    
R10                    x0000 0000 0008 C230    
R11                    x0000 0000 0008 C110    
R12                    x0000 0000 0005 006C    
R13                    x0000 0000 0002 0000    
R14                    x0000 0000 7C09 ED5C    
R15                    x0000 0000 0000 0001    
R16                    x0000 0000 0000 0215    
R17                    x0000 0000 0000 0000    
R18                    x0000 0000 0000 0210    
R19                    x0000 0000 0000 0006    
R20                    x0000 0000 0000 0040    
R21                    x0000 0000 0000 0000    
R22                    x0000 0000 0000 0000    
R23                    x0000 0000 0000 0000    
R24                    xFFFF FFFF 86BF C000    
R25                    x0000 0000 0000 0001    
R26                    xFFFF FFFF 8001 4FA4    
R27                    xFFFF FFFF 869D 1AC0    
R28                    xFFFF FFFF 8006 AA90    
FP                     x0000 0000 7FF8 7EE0    
SP                     x0000 0000 7FF8 7EE0    
PC                     xFFFF FFFF 8001 4FD0    
PSL                    x2000 0000 0000 1F04    
PTBR                   x0000 0000 0003 5EB4    
PCBB                   x0000 0000 6BD6 A080    
PRBR                   xFFFF FFFF 811C 7400    
VPTB                   xFFFF FEFA 0000 0000    
SCBB                   x0000 0000 0000 0F20    
SISR                   x0000 0000 0000 0000    
ASN                    x0000 0000 0000 007B    
ASTSR_ASTEN            x0000 0000 0000 000F    
FEN                    x0000 0000 0000 0001    
IPL                    x0000 0000 0000 001F    
MCES                   x0000 0000 0000 0000    
Crash_Code             x0000 0215              
Process_ID             x0002 00AE              
Process_Name_Length     14                     
Process_Name           RMI_grib_srvr3          
Crash_Code_String      MACHINECHK, Machine check while in kernel mode

It looks like the zbox memory controller throws the errors which would seem like rimm models being and issue. 
Any thoughts there? (Apart from some resocketing of the parts)



More information about the Info-vax mailing list