On Thu, Jan 04, 2024, Friedrich Weber wrote: > Hi, > > some of our (Proxmox VE) users have been reporting [1] that guests > occasionally become unresponsive with high CPU usage for some time > (varying between ~1 and more than 60 seconds). After that time, the > guests come back and continue running fine. Windows guests seem most > affected (not responding to pings during the hang, RDP sessions time > out). But we also got reports about Linux guests. This issue was not > present while we provided (host) kernel 5.15 and was first reported when > we rolled out a kernel based on 6.2. The reports seem to concern NUMA > hosts only. Users reported that the issue becomes easier to trigger the > more memory is assigned to the guests. Setting mitigations=off was > reported to alleviate (but not eliminate) the issue. The issue seems to > disappear after disabling KSM. > > We can reproduce the issue with a Windows guest on a NUMA host, though > only occasionally and not very reliably. Using a bpftrace script like > [7] we found the hangs to correlate with long-running invocations of > `task_numa_work` (more than 500ms), suggesting a connection to the NUMA > balancer. Indeed, we can't reproduce the issue after disabling the NUMA > balancer with `echo 0 > /proc/sys/kernel/numa_balancing` [2] and got a > user confirming this fixes the issue for them [3]. > > Since the Windows reproducer is not very stable, we tried to find a > Linux guest reproducer and have found one (described below [0]) that > triggers a very similar (hopefully the same) issue. The reproducer > triggers the hangs also if the host is on current Linux 6.7-rc8 > (610a9b8f). A kernel bisect points to the following as the commit > introducing the issue: > > f47e5bbb ("KVM: x86/mmu: Zap only TDP MMU leafs in zap range and > mmu_notifier unmap") > > which is why I cc'ed Sean and Paolo. Because of the possible KSM > connection I cc'ed Andrew and linux-mm. > > Indeed, on f47e5bbb~1 = a80ced6e ("KVM: SVM: fix panic on out-of-bounds > guest IRQ") the reproducer does not trigger the hang, and on f47e5bbb it > triggers the hang. > > Currently I don't know enough about the KVM/KSM/NUMA balancer code to > tell how the patch may trigger these issues. Any idea who we could ask > about this, or how we could further debug this would be greatly appreciated! This is a known issue. It's mostly a KVM bug[1][2] (fix posted[3]), but I suspect that a bug in the dynamic preemption model logic[4] is also contributing to the behavior by causing KVM to yield on preempt models where it really shouldn't. [1] https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@xxxxxxxxxxxxxxxxxxxxxxxxx [2] https://lore.kernel.org/all/bug-218259-28872@xxxxxxxxxxxxxxxxxxxxxxxxx%2F [3] https://lore.kernel.org/all/20240110012045.505046-1-seanjc@xxxxxxxxxx [4] https://lore.kernel.org/all/20240110214723.695930-1-seanjc@xxxxxxxxxx