https://bugzilla.kernel.org/show_bug.cgi?id=201825 Bug ID: 201825 Summary: Guest freezes during high IO load, host partially freezes, stack points to try_async_pf Product: Virtualization Version: unspecified Kernel Version: 4.19.2 Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: kvm Assignee: virtualization_kvm@xxxxxxxxxxxxxxxxxxxx Reporter: llowrey@xxxxxxxxxxxxxxxxx Regression: No Created attachment 279759 --> https://bugzilla.kernel.org/attachment.cgi?id=279759&action=edit SysRq l I've been getting regular freezes since kernel 4.17.19 and in all cases, the stack traces include calls to try_async_pf. The problem started after I "upgraded" to kernel 4.17.19. I reverted back to 4.17.12 and the problem did not occur again. I have suffered these freezes with every kernel after 4.17.19. I did not test kernels between 4.17.12 and 4.17.19 so I can't identify which version introduced the problem. The host is running 6 guests (either RHEL7 or Fedora 29). During periods of high disk io (eg nightly backups), one or more of the guests will freeze and the host will partially freeze. There are no error on the console. What I mean by the host partially freezing is that existing processes seem to operate normally (backups complete, web server continues to serve pages, etc) but most new commands hang and ctrl-c does not kill them. I am unable to ssh in completely. I can connect and authenticate but the connection hangs before I get a shell prompt. Same happens when trying to log in via the console. Certain commands like cat and echo work when used with files in /proc. But just about anything else will hang. I've been able to get sysrq stack dumps in many cases and I'll attach some to this ticket. The traces all point to the guest performing an async page fault. The host will typically have > 30GB of RAM free but also > 10GB of swap in use. The swap device was originally on a raid 5 of SSDs but I moved it to a simple partition on an NVMe device but the problem continued. REPRODUCTION 1. Run several KVM guests VMs 2. Allow the guests to swap 3. Apply heavy IO load to storage 4. Excercise a VM to encourage a page fault -- You are receiving this mail because: You are watching the assignee of the bug.