Hi Sean, On Fri, Jul 19, 2024 at 05:01:33PM -0700, Sean Christopherson wrote: > Move the logic to get the to-be-acknowledge IRQ for a nested VM-Exit from > nested_vmx_vmexit() to vmx_check_nested_events(), which is subtly the one > and only path where KVM invokes nested_vmx_vmexit() with > EXIT_REASON_EXTERNAL_INTERRUPT. A future fix will perform a last-minute > check on L2's nested posted interrupt notification vector, just before > injecting a nested VM-Exit. To handle that scenario correctly, KVM needs > to get the interrupt _before_ injecting VM-Exit, as simply querying the > highest priority interrupt, via kvm_cpu_has_interrupt(), would result in > TOCTOU bug, as a new, higher priority interrupt could arrive between > kvm_cpu_has_interrupt() and kvm_cpu_get_interrupt(). > > Opportunistically convert the WARN_ON() to a WARN_ON_ONCE(). If KVM has > a bug that results in a false positive from kvm_cpu_has_interrupt(), > spamming dmesg won't help the situation. > > Note, nested_vmx_reflect_vmexit() can never reflect external interrupts as > they are always "wanted" by L0. > > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > --- > arch/x86/kvm/vmx/nested.c | 25 ++++++++++++++++--------- > 1 file changed, 16 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index 2392a7ef254d..b3e17635f7e3 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -4284,11 +4284,26 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu) > } > > if (kvm_cpu_has_interrupt(vcpu) && !vmx_interrupt_blocked(vcpu)) { > + u32 exit_intr_info; > + > if (block_nested_events) > return -EBUSY; > if (!nested_exit_on_intr(vcpu)) > goto no_vmexit; > - nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0); > + > + if (nested_exit_intr_ack_set(vcpu)) { > + int irq; > + > + irq = kvm_cpu_get_interrupt(vcpu); > + WARN_ON_ONCE(irq < 0); > + > + exit_intr_info = INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR | irq; > + } else { > + exit_intr_info = 0; > + } > + > + nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, > + exit_intr_info, 0); > return 0; > } > > @@ -4969,14 +4984,6 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason, > vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; > > if (likely(!vmx->fail)) { > - if ((u16)vm_exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT && > - nested_exit_intr_ack_set(vcpu)) { > - int irq = kvm_cpu_get_interrupt(vcpu); > - WARN_ON(irq < 0); > - vmcs12->vm_exit_intr_info = irq | > - INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR; > - } > - > if (vm_exit_reason != -1) > trace_kvm_nested_vmexit_inject(vmcs12->vm_exit_reason, > vmcs12->exit_qualification, > -- > 2.45.2.1089.g2a221341d9-goog > I bisected (log below) an issue with starting a nested guest that appears on two of my newer Intel test machines (but not a somewhat old laptop) when this change as commit 6f373f4d941b ("KVM: nVMX: Get to-be-acknowledge IRQ for nested VM-Exit at injection site") in -next is present in the host kernel. I start a virtual machine with a full distribution using QEMU then start a nested virtual machine using QEMU with the same kernel and a much simpler Buildroot initrd, just to test the ability to run a nested guest. After this change, starting a nested guest results in no output from the nested guest and eventually the first guest restarts, sometimes printing a lockup message that appears to be caused from qemu-system-x86 My command for the first guest on the host (in case it matters): $ qemu-system-x86_64 \ -display none \ -serial mon:stdio \ -nic user,model=virtio-net-pci,hostfwd=tcp::8022-:22 \ -object rng-random,filename=/dev/urandom,id=rng0 \ -device virtio-rng-pci \ -chardev socket,id=char0,path=$VM_FOLDER/x86_64/arch/vfsd.sock \ -device vhost-user-fs-pci,queue-size=1024,chardev=char0,tag=host \ -object memory-backend-memfd,id=mem,share=on,size=16G \ -numa node,memdev=mem \ -m 16G \ -device virtio-balloon \ -smp 8 \ -drive if=pflash,format=raw,file=$VM_FOLDER/x86_64/arch/efi.img,readonly=on \ -drive if=pflash,format=raw,file=$VM_FOLDER/x86_64/arch/efi_vars.img \ -cpu host \ -enable-kvm \ -M q35 \ -drive if=virtio,format=qcow2,file=$VM_FOLDER/x86_64/arch/disk.img My commands in the first guest to spawn the nested guest: $ cd $(mktemp -d) $ curl -LSs https://archive.archlinux.org/packages/l/linux/linux-6.10.8.arch1-1-x86_64.pkg.tar.zst | tar --zstd -xf - $ curl -LSs https://github.com/ClangBuiltLinux/boot-utils/releases/download/20230707-182910/x86_64-rootfs.cpio.zst | zstd -d >rootfs.cpio $ qemu-system-x86_64 \ -display none \ -nodefaults \ -M q35 \ -d unimp,guest_errors \ -append 'console=ttyS0 earlycon=uart8250,io,0x3f8 loglevel=7' \ -kernel usr/lib/modules/6.10.8-arch1-1/vmlinuz \ -initrd rootfs.cpio \ -cpu host \ -enable-kvm \ -m 512m \ -smp 8 \ -serial mon:stdio If there is any additional information I can provide or patches I can test, I am more than happy to do so. Cheers, Nathan # bad: [6804f0edbe7747774e6ae60f20cec4ee3ad7c187] Add linux-next specific files for 20240903 # good: [67784a74e258a467225f0e68335df77acd67b7ab] Merge tag 'ata-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux git bisect start '6804f0edbe7747774e6ae60f20cec4ee3ad7c187' '67784a74e258a467225f0e68335df77acd67b7ab' # good: [6b63f16410fa86f6a2e9f52c9cb52ba94c363f3e] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git git bisect good 6b63f16410fa86f6a2e9f52c9cb52ba94c363f3e # good: [194eaf7dd63eef7cee974daeb4df01a3e6b144fe] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply.git git bisect good 194eaf7dd63eef7cee974daeb4df01a3e6b144fe # bad: [a8f65643f59dac67451d09ff298fa7f6e7917794] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt.git git bisect bad a8f65643f59dac67451d09ff298fa7f6e7917794 # good: [f80eff5b9f33c4f8d86ba046f3ee54c4f2224454] Merge branch 'timers/drivers/next' of https://git.linaro.org/people/daniel.lezcano/linux.git git bisect good f80eff5b9f33c4f8d86ba046f3ee54c4f2224454 # bad: [a93e40d038ccd17e6cf691e1b8fec8821998baf2] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu.git git bisect bad a93e40d038ccd17e6cf691e1b8fec8821998baf2 # good: [500b6c92524183f97e3a3c8e6c56f8af69429ba4] Merge branch 'non-rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git git bisect good 500b6c92524183f97e3a3c8e6c56f8af69429ba4 # bad: [642613182efa0927c8bd4d4ef2c6b8350554b6ad] Merge branches 'fixes', 'generic', 'misc', 'mmu', 'pat_vmx_msrs', 'selftests', 'svm' and 'vmx' git bisect bad 642613182efa0927c8bd4d4ef2c6b8350554b6ad # good: [1876dd69dfe8c29e249070b0e4dc941fc15ac1e4] KVM: x86: Add fastpath handling of HLT VM-Exits git bisect good 1876dd69dfe8c29e249070b0e4dc941fc15ac1e4 # bad: [44518120c4ca569cfb9c6199e94c312458dc1c07] KVM: nVMX: Detect nested posted interrupt NV at nested VM-Exit injection git bisect bad 44518120c4ca569cfb9c6199e94c312458dc1c07 # good: [2ab637df5f68d4e0cfa9becd10f43400aee785b3] KVM: VMX: hyper-v: Prevent impossible NULL pointer dereference in evmcs_load() git bisect good 2ab637df5f68d4e0cfa9becd10f43400aee785b3 # bad: [f729851189d5741e7d1059e250422611028657f8] KVM: x86: Don't move VMX's nested PI notification vector from IRR to ISR git bisect bad f729851189d5741e7d1059e250422611028657f8 # bad: [cb14e454add0efc9bc461c1ad30d9c950c406fab] KVM: nVMX: Suppress external interrupt VM-Exit injection if there's no IRQ git bisect bad cb14e454add0efc9bc461c1ad30d9c950c406fab # bad: [6f373f4d941bf79f06e9abd4a827288ad1de6399] KVM: nVMX: Get to-be-acknowledge IRQ for nested VM-Exit at injection site git bisect bad 6f373f4d941bf79f06e9abd4a827288ad1de6399 # first bad commit: [6f373f4d941bf79f06e9abd4a827288ad1de6399] KVM: nVMX: Get to-be-acknowledge IRQ for nested VM-Exit at injection site