I have tried to get a 3.2 and higher Kernel with support for more than 64 CPUs working on an SGI Altix 4700 with no luck. Every Kernel I have built keeps reporting the same error: Non-existent Memory Address Error. I am not sure if this is a Kernel issue or not but I will supply the output from the POD that was entered due to MCA. Any pointers or help would be greatly appreciated.
Short Error output:
000 051.21^1#0a: index time stamp type component subcomponent
000 051.21^1#0a: ----- ------------------ --------- ------------ ------------
000 051.21^1#0a: 0 0x000000c92deef702 MD_HW 051.21^1#0 Non-existent Memory Address Error
000 051.21^1#0a: 1 0x000000ce43920f08 PI_HW 051.21^1#0 RRB Time-out Error
000 051.21^1#0a: 2 0x000000ce43b16400 PROC_MCA 051.21^1#0a Bus Check
000 051.21^1#0a: SH2_EVENT_OCCURRED : 0x0000008180000003
000 051.21^1#0a: MD Hardware Interrupt Pending
000 051.21^1#0a: SH2_FIRST_ERROR : 0x0000000000000002
000 051.21^1#0a: MD Hardware Interrupt Pending
000 051.21^1#0a: SH2_MEM_ERROR_SUMMARY : 0x0000007800000002
000 051.21^1#0a: Non-existent Memory Address Error
000 051.21^1#0a: SH2_MEM_FIRST_ERROR : 0x0000000000000002
000 051.21^1#0a: MD_HW_INT: Non-existent Memory Address Error
000 051.21^1#0a: SH2_MISC_ERR_HDR_UPPER : 0x0000000001f00004
000 051.21^1#0a: Non-Existant Memory Address Error Header Captured
000 051.21^1#0a: Echo: 0x1f
000 051.21^1#0a: SH2_MISC_ERR_HDR_LOWER : 0x8800010000000000
000 051.21^1#0a: Source : pi chiplet, nasid 0x0
000 051.21^1#0a: Command : NCRD, Non-coherent read
000 051.21^1#0a: Read Operation
000 051.21^1#0a: SH2_MISC_ADRS_ERR_HDR_LOWER_A : 0x80000001014cf070
000 051.21^1#0a: Address <37:0>: 0x1014cf070
000 051.21^1#0a: Read Operation
000 051.21^1#0a: SH2_MD_HW_TIME_STAMP : 0x800000fa22fade06
000 051.21^1#0a:
000 051.21^1#0a: PI_HW :051.21^1#0 :RRB Time-out Error
000 051.21^1#0a:
000 051.21^1#0a: SH2_EVENT_OCCURRED : 0x0000008180000003
000 051.21^1#0a: PI Hardware Interrupt Pending
000 051.21^1#0a: SH2_FIRST_ERROR : 0x0000000000000002
000 051.21^1#0a: SH2_PI_ERROR_SUMMARY : 0x0000000000000010
000 051.21^1#0a: RRB Time-out Error
000 051.21^1#0a: SH2_PI_FIRST_ERROR : 0x0000000000000010
000 051.21^1#0a: RRB Time-out Error
000 051.21^1#0a: SH2_PI_ERROR_DETAIL_1 : 0xfe200001014cf071
000 051.21^1#0a: SH2_PI_ERROR_DETAIL_2 : 0x000000001f0801f1
000 051.21^1#0a: Address : 0x1014cf070
000 051.21^1#0a: Table Select : 0x4
000 051.21^1#0a: Command : RESERVED_FE
000 051.21^1#0a: IsReal : 0x1
000 051.21^1#0a: RRB Idx : 0x1f
000 051.21^1#0a: WRB Idx : 0x0
000 051.21^1#0a: IRB Idx : 0x0
000 051.21^1#0a: Error Code : 0x4
000 051.21^1#0a: Echo : 0x1f
000 051.21^1#0a: Source : not available
000 051.21^1#0a: Supplemental : 0x0
000 051.21^1#0a: AXB Queue : 0x0
000 051.21^1#0a: SH2_PI_HW_TIME_STAMP : 0x800000feba7afc05
000 051.21^1#0a:
000 051.21^1#0a: PROC_MCA :051.21^1#0a :Bus Check
000 051.21^1#0a:
000 051.21^1#0a: processor lid : 0x0000000000000000
000 051.21^1#0a: cpu: A nasid: 0x0
000 051.21^1#0a: processor state parameter : 0x20010000fff21120
000 051.21^1#0a: rendevous was not attempted
000 051.21^1#0a: min state is valid
000 051.21^1#0a: not continuable
000 051.21^1#0a: machine check is isolated
000 051.21^1#0a: more info available
000 051.21^1#0a: ip logged is not precise
000 051.21^1#0a: min state is not precise
000 051.21^1#0a: shared MCA
000 051.21^1#0a: bus check
000 051.21^1#0a: PAL recovery status:
000 051.21^1#0a: error was isolated and contained, continuable if sw can recover
000 051.21^1#0a: processor error map : 0x0000000001000000
000 051.21^1#0a: processor code id: 0
000 051.21^1#0a: logical thread id: 0
000 051.21^1#0a: processor bus level 1 error
000 051.21^1#0a: processor structure: bus
000 051.21^1#0a: bus check : 0x1880000000800141
000 051.21^1#0a: bus transaction size: 1
000 051.21^1#0a: external bus error
000 051.21^1#0a: transaction type: partial read
000 051.21^1#0a: bus error severity: 0
000 051.21^1#0a: bus hierarchy: 0
000 051.21^1#0a: UCE detected on incoming
000 051.21^1#0a: ia64 instruction set
000 051.21^1#0a: machine check corrected
000 051.21^1#0a: target address valid
000 051.21^1#0a: target identifier : 0x00000001014cf071
_______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies