Stephen John Smoogen wrote: > On Wed, 22 May 2019 at 09:30, mark <m.roth@xxxxxxxxx> wrote: > >> Ok, we used to get this occasionally on cluster nodes, and we just got >> it on a fileserver (very bad). The system is discovered to be >> unresponsive: >> it doesn't ping, and plugging a console in, you can see that it's not >> dead, but there nothing at all on the screen, nor does it respond to >> even <ctrl-alt-del>. The only answer is to power cycle it; it comes up >> fine. >> >> Nothing in /var/log/dmesg or /var/log/messages. No abrts I can find. >> sar tells me it went unredponsive between 18:10 and 10:20 yesterday. >> Note that >> there are no further entries in sar, either, for yesterday, after the >> event, and nothing till I power cycled it. >> > From the above description, I would normally say it sounds like hardware. > However, why do you say the system is not dead when you plug in a > console.. but there is nothing on the screen and it doesn't respond to > control-alt-delete. To me that sounds like 'dead'. Usually the cpu is > hardlocked or the hardware went into 'over-heat' and put everything in a > deep sleep hoping it would cool down but never wake up. > It seems unlikely. It's a 4U server, with 36 disks (and the dual root disks), in a machine room, and ipmitool sel list shows nada, nor are there any warnings, as I've seen on other systems occasionally, that the CPU is overheating, and is being throttled. > > >> Has anyone else seen this - I can't imagine it's only us - or have any >> thoughts? >> >> C 7, 7.6.1810 >> >> >> mark >> >> >> _______________________________________________ >> CentOS mailing list >> CentOS@xxxxxxxxxx >> https://lists.centos.org/mailman/listinfo/centos >> >> > > > -- > Stephen J Smoogen. > _______________________________________________ > CentOS mailing list > CentOS@xxxxxxxxxx > https://lists.centos.org/mailman/listinfo/centos > > _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx https://lists.centos.org/mailman/listinfo/centos