Re: system unresponsive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



On Wed, May 22, 2019 at 10:22 AM mark <m.roth@xxxxxxxxx> wrote:

> It seems unlikely. It's a 4U server, with 36 disks (and the dual root
> disks), in a machine room, and ipmitool sel list shows nada, nor are there
> any warnings, as I've seen on other systems occasionally, that the CPU is
> overheating, and is being throttled.


If this is a recent sever (ivybridge/haswell/broadwell) then I’ve seen the
“edac” kernel module prevent SEL from showing faults when a
MCE/machine-check-exception occurs. Disable edac and poof server stops
crashing and/or SEL shows something useful(ECC/MCE). Did you check
/var/log/mcelog?
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
https://lists.centos.org/mailman/listinfo/centos




[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]


  Powered by Linux