On Wed, Sep 07, 2022, František Šumšal wrote: > Hello! > > In our Arch Linux part of the upstream systemd CI I recently noticed an > uptrend in CPU soft lockups when running one of our tests. This test runs > several systemd-nspawn containers in succession and sometimes the underlying > VM locks up due to a CPU soft lockup By "underlying VM", do you mean L1 or L2? Where L0 == Bare Metal L1 == Arch Linux (KVM, 5.19.5-arch1-1/5.19.7-arch1-1) L2 == Arch Linux (nested KVM or QEMU TCG, 5.19.5-arch1-1/5.19.7-arch1-1) > (just to clarify, the topology is: CentOS Stream 8 (baremetal, > 4.18.0-305.3.1.el8) -> Arch Linux (KVM, 5.19.5-arch1-1/5.19.7-arch1-1) -> > Arch Linux (nested KVM or QEMU TCG, happens with both, > 5.19.5-arch1-1/5.19.7-arch1-1) -> nspawn containers). Since this repros with TCG, that rules out nested KVM as the cuplrit. > I did some further testing, and it reproduces even when the baremetal is my > local Fedora 36 machine (5.17.12-300.fc36.x86_64). > > Unfortunately, I can't provide a simple and reliable reproducer, as I can > reproduce it only with that particular test and not reliably (sometimes it's > the first iteration, sometimes it takes an hour or more to reproduce). > However, I'd be more than glad to collect more information from one such > machine, if possible. ... > Also, in one instance, the machine died with: Probably unrelated, but same question as above: which layer does "the machine" refer to?