On 9/7/22 17:08, Sean Christopherson wrote:
On Wed, Sep 07, 2022, František Šumšal wrote:Hello! In our Arch Linux part of the upstream systemd CI I recently noticed an uptrend in CPU soft lockups when running one of our tests. This test runs several systemd-nspawn containers in succession and sometimes the underlying VM locks up due to a CPU soft lockupBy "underlying VM", do you mean L1 or L2? Where L0 == Bare Metal L1 == Arch Linux (KVM, 5.19.5-arch1-1/5.19.7-arch1-1) L2 == Arch Linux (nested KVM or QEMU TCG, 5.19.5-arch1-1/5.19.7-arch1-1)
I mean L2.
(just to clarify, the topology is: CentOS Stream 8 (baremetal, 4.18.0-305.3.1.el8) -> Arch Linux (KVM, 5.19.5-arch1-1/5.19.7-arch1-1) -> Arch Linux (nested KVM or QEMU TCG, happens with both, 5.19.5-arch1-1/5.19.7-arch1-1) -> nspawn containers).Since this repros with TCG, that rules out nested KVM as the cuplrit.\
Ah, that's a good point, thanks.
I did some further testing, and it reproduces even when the baremetal is my local Fedora 36 machine (5.17.12-300.fc36.x86_64). Unfortunately, I can't provide a simple and reliable reproducer, as I can reproduce it only with that particular test and not reliably (sometimes it's the first iteration, sometimes it takes an hour or more to reproduce). However, I'd be more than glad to collect more information from one such machine, if possible....Also, in one instance, the machine died with:Probably unrelated, but same question as above: which layer does "the machine" refer to?
Same as in the previous case - it's the L2. -- PGP Key ID: 0xFB738CE27B634E4B
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature