Re: New kernel causes hardware error?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



Thanks Guys!

Your advice helped me fix the problem.

Yes, it was the motherboard that was the issue. I update the firmware 
and must have had some microcode fixes to support my CPU (John mentioned 
the memory controller is in the CPU for Xeon 5500).

Now upon reboot using 2.6.18-194.3.1.el5 no errors are found in mcelog.

I will do some further testing, but think that I'm in the clear.


Thank you so much! I spent hours googling trying to find a solution to 
this, couldn't find the error reported anywhere else. Glad to have some 
people I can turn to for advice.


All the best,
eric



Tsuyoshi Nagata wrote:
> Hi! Eric
> (2010/06/22 13:11), Eric Deis wrote:
>   
>> Transaction: Address/Command error
>>     
>
> Its mother board (memory controller) problem.
> Its *not* DIMM problem.(memtest can't detect this error.)
> your data transfer(read/write) sometimes met bit errors.
> This is Nehalem cpu's error detecting feature.(MCE)
>
> Try new mother board,
> or your MB always indicates this error in latest kernel,
> Its time to buy certified vendors hardware.
>
> Supermicro's MB is not certified hardware, but
> she just indicates hardware problem.
>
> Tsuyoshi.
>   
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos


[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]
  Powered by Linux