On Sat, Dec 18, 2021, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote: > > Hmm, that strongly suggests the "vcpu != kvm_get_running_vcpu()" is at fault. > > Can you try running with the below commit? It's currently sitting in kvm/queue, > > but not marked for stable because I didn't think it was possible for the check > > to a cause a missed wake event in KVM's current code base. > > > > The below commit can fix the bug, we have just completed the tests. > Thanks. Aha! Somehow I missed this call chain when analyzing the change. irqfd_wakeup() | |->kvm_arch_set_irq_inatomic() | |-> kvm_irq_delivery_to_apic_fast() | |-> kvm_apic_set_irq() Paolo, can the changelog be amended to the below, and maybe even pull the commit into 5.16? KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU Drop a check that guards triggering a posted interrupt on the currently running vCPU, and more importantly guards waking the target vCPU if triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE. If a vIRQ is delivered from asynchronous context, the target vCPU can be the currently running vCPU and can also be blocking, in which case skipping kvm_vcpu_wake_up() is effectively dropping what is supposed to be a wake event for the vCPU. The "do nothing" logic when "vcpu == running_vcpu" mostly works only because the majority of calls to ->deliver_posted_interrupt(), especially when using posted interrupts, come from synchronous KVM context. But if a device is exposed to the guest using vfio-pci passthrough, the VFIO IRQ and vCPU are bound to the same pCPU, and the IRQ is _not_ configured to use posted interrupts, wake events from the device will be delivered to KVM from IRQ context, e.g. vfio_msihandler() | |-> eventfd_signal() | |-> ... | |-> irqfd_wakeup() | |->kvm_arch_set_irq_inatomic() | |-> kvm_irq_delivery_to_apic_fast() | |-> kvm_apic_set_irq() This also aligns the non-nested and nested usage of triggering posted interrupts, and will allow for additional cleanups. Fixes: 379a3c8ee444 ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath") Cc: stable@xxxxxxxxxxxxxxx Reported-by: Longpeng (Mike) <longpeng2@xxxxxxxxxx> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> Message-Id: <20211208015236.1616697-18-seanjc@xxxxxxxxxx> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > > commit 6a8110fea2c1b19711ac1ef718680dfd940363c6 > > Author: Sean Christopherson <seanjc@xxxxxxxxxx> > > Date: Wed Dec 8 01:52:27 2021 +0000 > > > > KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU > > > > Drop a check that guards triggering a posted interrupt on the currently > > running vCPU, and more importantly guards waking the target vCPU if > > triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE. > > The "do nothing" logic when "vcpu == running_vcpu" works only because KVM > > doesn't have a path to ->deliver_posted_interrupt() from asynchronous > > context, e.g. if apic_timer_expired() were changed to always go down the > > posted interrupt path for APICv, or if the IN_GUEST_MODE check in > > kvm_use_posted_timer_interrupt() were dropped, and the hrtimer fired in > > kvm_vcpu_block() after the final kvm_vcpu_check_block() check, the vCPU > > would be scheduled() out without being awakened, i.e. would "miss" the > > timer interrupt. > > > > One could argue that invoking kvm_apic_local_deliver() from (soft) IRQ > > context for the current running vCPU should be illegal, but nothing in > > KVM actually enforces that rules. There's also no strong obvious benefit > > to making such behavior illegal, e.g. checking IN_GUEST_MODE and calling > > kvm_vcpu_wake_up() is at worst marginally more costly than querying the > > current running vCPU. > > > > Lastly, this aligns the non-nested and nested usage of triggering posted > > interrupts, and will allow for additional cleanups. > > > > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > > Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> > > Message-Id: <20211208015236.1616697-18-seanjc@xxxxxxxxxx> > > Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > index 38749063da0e..f61a6348cffd 100644 > > --- a/arch/x86/kvm/vmx/vmx.c > > +++ b/arch/x86/kvm/vmx/vmx.c > > @@ -3995,8 +3995,7 @@ static int vmx_deliver_posted_interrupt(struct kvm_vcpu > > *vcpu, int vector) > > * guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering a > > * posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE. > > */ > > - if (vcpu != kvm_get_running_vcpu() && > > - !kvm_vcpu_trigger_posted_interrupt(vcpu, false)) > > + if (!kvm_vcpu_trigger_posted_interrupt(vcpu, false)) > > kvm_vcpu_wake_up(vcpu); > > > > return 0;