Re: Interpretation of a hardware error

Peter Kjellström <cap@xxxxxxxxxx> · Fri, 13 Apr 2012 11:42:13 +0200

On Thursday 12 April 2012 13.36.03 m.roth@xxxxxxxxx wrote:
> Hey, folks,
> 
> I've just started seeing
> Apr 12 13:09:59 <server> kernel: [Hardware Error]:
> MC4_STATUS[Over|CE|MiscV|-|AddrV|-|Poison|CECC]: 0xdd0accf2001d011b
> Apr 12 13:09:59 <server> kernel: [Hardware Error]: Northbridge Error (node
> 1, core 1): ECC error in L3 cache tag.

The error message certainly points to the CPU. The fact that the error 
happened on cache tag, not cache data further implicates the CPU.

The message is quite specific and I'd say rather trustworthy...

But there's also the possibility that the message is wrong (either something 
else went wrong or nothing really went wrong). In my experience hardware fault 
error messages are quite unreliable and at the end of the day DIMMs are 
magnitudes more likely to fail than CPUs...

/Peter
Attachment:
signature.asc

Description: This is a digitally signed message part.
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos