On Fri, Jul 19, 2024 at 05:01:38PM -0700, Sean Christopherson wrote: >When synthensizing a nested VM-Exit due to an external interrupt, pend a >nested posted interrupt if the external interrupt vector matches L2's PI >notification vector, i.e. if the interrupt is a PI notification for L2. >This fixes a bug where KVM will incorrectly inject VM-Exit instead of >processing nested posted interrupt when IPI virtualization is enabled. > >Per the SDM, detection of the notification vector doesn't occur until the >interrupt is acknowledge and deliver to the CPU core. > > If the external-interrupt exiting VM-execution control is 1, any unmasked > external interrupt causes a VM exit (see Section 26.2). If the "process > posted interrupts" VM-execution control is also 1, this behavior is > changed and the processor handles an external interrupt as follows: > > 1. The local APIC is acknowledged; this provides the processor core > with an interrupt vector, called here the physical vector. > 2. If the physical vector equals the posted-interrupt notification > vector, the logical processor continues to the next step. Otherwise, > a VM exit occurs as it would normally due to an external interrupt; > the vector is saved in the VM-exit interruption-information field. > >For the most part, KVM has avoided problems because a PI NV for L2 that >arrives will L2 is active will be processed by hardware, and KVM checks >for a pending notification vector during nested VM-Enter. With this series in place, I wonder if we can remove the check for a pending notification vector during nested VM-Enter. /* Emulate processing of posted interrupts on VM-Enter. */ if (nested_cpu_has_posted_intr(vmcs12) && kvm_apic_has_interrupt(vcpu) == vmx->nested.posted_intr_nv) { vmx->nested.pi_pending = true; kvm_make_request(KVM_REQ_EVENT, vcpu); kvm_apic_clear_irr(vcpu, vmx->nested.posted_intr_nv); } I believe the check is arguably incorrect because: 1. nested_vmx_run() may set pi_pending and clear the IRR bit of the notification vector, but this doesn't guarantee that vmx_complete_nested_posted_interrupt() will be called later in vmx_check_nested_events(). This could lead to partial posted interrupt processing, where the IRR bit is cleared but PIR isn't copied into VIRR. This might confuse L1 since, from L1's perspective, posted interrupt processing should be atomic. Per the SDM, the logical processor performs posted-interrupt processing "in an uninterruptible manner". 2. The check doesn't respect event priority. For example, if a higher-priority event (preemption timer exit or NMI-window exit) causes an immediate nested VM-exit, the notification vector should remain pending after the nested VM-exit.