On Tue, 2024-04-16 at 14:44 +0200, Julian Stecklina wrote: > On Mon, 2024-04-01 at 15:22 -0700, Sean Christopherson wrote: > > On Wed, Mar 27, 2024, Julian Stecklina wrote: > > > > > > > > When we enable nested virtualization, we see what looks like corruption in > > > the > > > nested guest. The guest trips over exceptions that shouldn't be there. We > > > are > > > currently debugging this to find out details, but the setup is pretty > > > painful > > > and it will take a bit. If we disable the timer signals, this issue goes > > > away > > > (at the cost of broken VBox timers obviously...). This is weird and has > > > left us > > > wondering, whether there might be something broken with signals in this > > > scenario, especially since none of the other VMMs uses this method. > > > > It's certainly possible there's a kernel bug, but it's probably more likely > > a > > problem in your userspace. QEMU (and others VMMs) do use signals to > > interrupt > > vCPUs, e.g. to take control for live migration. That's obviously different > > than > > what you're doing, and will have orders of magnitude lower volume of signals > > in > > nested guests, but the effective coverage isn't "zero". > > After some weeks of bug hunting, my colleague Thomas has found the issue and > we > posted a patch: > > https://lore.kernel.org/kvm/20240416123558.212040-1-julian.stecklina@xxxxxxxxxxxxxxxxxxxxx/T/#t It's this patch specifically: https://lore.kernel.org/kvm/20240416123558.212040-1-julian.stecklina@xxxxxxxxxxxxxxxxxxxxx/T/#m2eebd2ab30a86622aea3732112150851ac0768fe Thanks, Julian