On 2/25/25 00:55, Sean Christopherson wrote:
This was _supposed_ to be a tiny one-off patch to fix a nVMX bug where KVM fails to detect that, after nested VM-Exit, L1 has a pending IRQ (or NMI). But because x86's nested teardown flows are garbage (KVM simply forces a nested VM-Exit to put the vCPU back into L1), that simple fix snowballed. The immediate issue is that checking for a pending interrupt accesses the legacy PIC, and x86's kvm_arch_destroy_vm() currently frees the PIC before destroying vCPUs, i.e. checking for IRQs during the forced nested VM-Exit results in a NULL pointer deref (or use-after-free if KVM didn't nullify the PIC pointer). That's patch 1. Patch 2 is the original nVMX fix. The remaining patches attempt to bring a bit of sanity to x86's VM teardown code, which has accumulated a lot of cruft over the years. E.g. KVM currently unloads each vCPU's MMUs in a separate operation from destroying vCPUs, all because when guest SMP support was added, KVM had a kludgy MMU teardown flow that broken when a VM had more than one 1 vCPU. And that oddity lived on, for 18 years...
Queued patches 1 and 2 to kvm/master, and everything to kvm/queue (pending a little more testing and the related TDX change).
Paolo