On Wed, Sep 04, 2024, Sean Christopherson wrote: > On Wed, Sep 04, 2024, Nathan Chancellor wrote: > > I bisected (log below) an issue with starting a nested guest that > > appears on two of my newer Intel test machines (but not a somewhat old > > laptop) when this change as commit 6f373f4d941b ("KVM: nVMX: Get > > to-be-acknowledge IRQ for nested VM-Exit at injection site") in -next is > > present in the host kernel. > > > > I start a virtual machine with a full distribution using QEMU then start > > a nested virtual machine using QEMU with the same kernel and a much > > simpler Buildroot initrd, just to test the ability to run a nested > > guest. After this change, starting a nested guest results in no output > > from the nested guest and eventually the first guest restarts, sometimes > > printing a lockup message that appears to be caused from qemu-system-x86 > > *sigh* > > It's not you, it's me. > > I just bisected hangs in my nested setup to this same commit. Apparently, I > completely and utterly failed at testing. > > There isn't that much going on here, so knock wood, getting a root cause shouldn't > be terribly difficult. Well fudge. My attempt to avoid splitting kvm_get_apic_interrupt() and exposing more lapic.c internals to nested VMX failed spectaculary. Hiding down in apic_set_isr() is a call to hwapic_isr_update(), which updates vmcs.GUEST_INTERRUPT_STATUS.SVI to mirror the highest vector in the virtual APIC's ISR. On a nested VM-Exit due to a IRQ, that update is supposed to hit vmcs01. By moving the call to kvm_get_apic_interrupt() out of nested_vmx_vmexit(), that update hits vmcs02 instead, and things go downhill from there. The obvious/easy solution is to split kvm_get_apic_interrupt() so that nVMX can find an interrupt, emulate nested VM-Exit or posted interrupt processing as appropriate, and _then_ ACK the IRQ (if a VM-Exit was synthesized). It's not really any harder than what I did here, as above I just didn't want to split kvm_get_apic_interrupt(). But I don't see any sane alternative, and in the end it's not any worse than plumbing the notification vector into kvm_get_apic_interrupt(); either way, we're bleeding implementation details between common x86 code and nVMX. Luckily, this series is sitting at the top of `kvm-x86 vmx` (yay, topic branches!), so I'll just drop the entire series and post a full v2. Unless I botched this new version too (haven't tested yet), I should get v2 posted tomorrow. Sorry for pushing garbage, this should never have been posted, let alone gotten applied to -next.