Adding to what Chris suggests. When ssh fails, always ping the ip address. If the ping responds then the kernel is up in some state (during heavy paging/deadlocks ping generally responds if the kernel is still running and has not crashed). If ping does not respond either the network has died (typically the network does not usually stop responding unless someone screws up and takes it down--though I do know of at least one network card crash that I have seen drop the network many times--but it is easy to diag since it logs the issue) or the kernel has crashed because of something. Enabling crash dumps might be a good idea, if the crash does not collect and/or try to collect and the node boots back up then that is often a sign of a hardware fault that forced an immediate reset of the hardware. On Sat, Jul 31, 2021 at 1:01 AM Chris Murphy <lists@xxxxxxxxxxxxxxxxx> wrote: > > On Fri, Jul 30, 2021 at 2:00 PM Roger Heflin <rogerheflin@xxxxxxxxx> wrote: > > > > If it was just a plasma crash, then ssh and/or the alt keys would have > > worked to switch terminals. > > > > Details said neither worked. The kernel and/or a significant part of > > userspace was deadlocked and/or crashed. > > I wonder if logs contain anything... i.e. from the boot following the > failed update, use journalctl -b-1 and if it's 5 boots back use -b-5 > > It might have the start of the problem anyway. I also suspect a > deadlock. It can make it seem like ssh is dead but it's just super > slow. Or may even time out unless a session has already started. > Workstation edition and KDE spin have improved resource control, which > is a work in-progress (also on KDE you will need to install > uresourced). This attempts to ensure minimum resources are available > for the desktop to be responsive. One possible limitation is IO > pressure, we're not quite there yet implementing IO isolation. A > deadlock though is a different problem so the resource control work > wouldn't help. > > If you ever see "task xxx:yyy blocked for more than 120 seconds" it's > best to issue sysrq+w (i.e. echo w > /proc/sysrq-trigger) to dump > extra debugging information into the kernel message buffer, and then > file a bug attaching dmesg. > > > -- > Chris Murphy > _______________________________________________ > users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx > To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx > Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx > Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure _______________________________________________ users mailing list -- users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure