On Wed, Sep 07, 2022, František Šumšal wrote: > On 9/7/22 17:08, Sean Christopherson wrote: > > On Wed, Sep 07, 2022, František Šumšal wrote: > > > Hello! > > > > > > In our Arch Linux part of the upstream systemd CI I recently noticed an > > > uptrend in CPU soft lockups when running one of our tests. This test runs > > > several systemd-nspawn containers in succession and sometimes the underlying > > > VM locks up due to a CPU soft lockup > > > > By "underlying VM", do you mean L1 or L2? Where > > > > L0 == Bare Metal > > L1 == Arch Linux (KVM, 5.19.5-arch1-1/5.19.7-arch1-1) > > L2 == Arch Linux (nested KVM or QEMU TCG, 5.19.5-arch1-1/5.19.7-arch1-1) > > I mean L2. Is there anything interesting in the L1 or L0 logs? A failure in a lower level can manifest as a soft lockup and/or stall in the VM, e.g. because a vCPU isn't run by the host for whatever reason. Does the bug repro with an older version of QEMU? If it's difficult to roll back the QEMU version, then we can punt on this question for now. Is it possible to run the nspawn tests in L1? If the bug repros there, that would greatly shrink the size of the haystack.