I have recently upgraded to 2.6.18-194.3.1.el5 and within several days the machine crashed with the following error (repeating in mcelog): MCE 0 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 2 BANK 8 MISC 41 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid Processor context corrupt MCA: MEMORY CONTROLLER AC_CHANNEL0_ERR Transaction: Address/Command error Memory address parity error Memory corrected error count (CORE_ERR_CNT): 911 Memory transaction Tracker ID (RTId): 41 Memory DIMM ID of error: 0 Memory channel ID of error: 0 Memory ECC syndrome: 0 STATUS ea10e3c0008000b0 MCGSTATUS 0 MCE 0 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 2 BANK 8 MISC 41 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid Processor context corrupt MCA: MEMORY CONTROLLER AC_CHANNEL0_ERR Transaction: Address/Command error Memory address parity error Memory corrected error count (CORE_ERR_CNT): 7970 Memory transaction Tracker ID (RTId): 41 Memory DIMM ID of error: 0 Memory channel ID of error: 0 Memory ECC syndrome: 0 STATUS ea17c880008000b0 MCGSTATUS 0 Everytime the error occurs, the only variables that change are CORE_ERR_CNT and STATUS. Since this appears to be a memory error, I have run memtest86+ many times. However it does not report any errors. Reverting back other Kernels (below) and testing, this above error would be generated only once (after boot) and then not be reported again and definitely wasn't causing kernel panic and crashing the machine. CentOS-5.4 (2.6.18-164.15.1.el5) CentOS (2.6.18-164.9.1.el5) CentOS (2.6.18-164.el5) Would this error indicate a motherboard or CPU problem? How can I diagnose? or is there something funny with the Kernel? Hardware: Supermicro X8DTL-iF motherboard. Intel Server Xeon E5502 1.86GHz Nehalem 8GB Ram Kingston DDR3-1333 w/ Parity w/ Thermal Sensor I have read on bugzilla note about mcelog and not supporting nehalem processor during error decoding. I think this is fixed in Centos 5.5, but maybe there is still a bug? https://bugzilla.redhat.com/show_bug.cgi?id=473392 _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos