If you cannot login to the machine via ssh, also try pinging it. If ping works but ssh does not either ssh died, or the machine is paging so heavily that user space cannot respond in a reasonable time. If the disk were an issue there should be messages about something in the disk layer timing out, but it sounds like there aren't any of those sorts of messages. If it was a controller hardware/pci slot/hw issue that will in some cases cause an immediate power cycle and boot back up. You might also configure kdump, there should be doc's someplace on configuring it for your distribution, once configured then test it with "echo c > /proc/sysrq-trigger" and that should crash the machine and leave you with a kernel core dump + dmesg from the time of the crash. Also if kdump is configured and working it will crash/dump memory and typically boot back up automatically. On Wed, Dec 15, 2021 at 3:54 AM Wols Lists <antlists@xxxxxxxxxxxxxxx> wrote: > > Don't know if this is off-topic or not, seeing as my system is very much > reliant on raid ... > > But basically I'm seeing the system just stop responding. Typically it's > in screensaver mode, I've got a blank screen, and it won't wake up. (I > used to think it was something to do with Thunderbird, it mostly > happened while TB was hammering the system, but no ...) > > Today, I had it happen while the system was idle but not in screensaver, > I run xosview, and everything was clearly frozen - including xosview. > > As you might know, my stack is ext4 over lvm (over raid over > dm-integrity for /home) over spinning rust. > > And I run gentoo/systemd - currently on the latest stable kernel afaik, > 5.10.76-gentoo-r1 SMP x86_64. > > Any advice on how to debug a hang - basically I need something that'll > just sit there so when it crashes (and I press the reset button to > recover) I'll have some sort of trace. It would be nice to prove it's > not the disk stack at fault ... > > Obviously, "set these options in the kernel" won't faze me ... > > Cheers, > Wol