Re: Cant find out MCE reason (CPU 35 BANK 8)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



Vladimir Budnev wrote:
> Hello community.
>
> We are running, Centos 4.8 on SuperMicro SYS-6026T-3RF with 2xIntel Xeon
> E5630 and 8xKingston KVR1333D3D4R9S/4G
>
> For some time we have lots of MCE in mcelog and we cant find out the
> reason.

The only thing that shows there (when it shows, since sometimes it doesn't
seem to) is a hardware error. You *WILL* be replacing hardware, sometime
soon, like yesterday.

"Normal" is not: *ANYTHING* here is Bad News. First, you've got DIMMs
failing.  CPU 53, assuming this system doesn't have 53+ physical CPUs,
means that you have x-core systems, so you need to divide by x, so that if
it's a 12-core system with 6 physical chips, that would make it DIMM 8
associated with that physical CPU.
<snip>
> One more interesting thins is the following output:
> [root@zuno]# cat /var/log/mcelog |grep CPU|sort|awk '{print $2}'|uniq
> 32
> 33
> 34
> 35
> 50
> 51
> 52
> 53
>
> Those numbers are always the same.

Bad news: you have *two* DIMMs failing, one associated with the physical
CPU that has core 53, and another associated with the physical CPU that
has cores 32-35.

Talk to your OEM support to help identify which banks need replacing,
and/or find a motherboard diagram.

          mark, who has to deal *again* with one machine with the same
problem....

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos


[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]
  Powered by Linux