On 02/05/20 11:24, Suravee Suthikulpanit wrote: > This is due to re-entrancy of the lazy update EOI logic > when enable APICv with VFIO pass-through device, which > sets up kvm_irqfd() w/ KVM_IRQFD_FLAG_RESAMPLE. > > Fixes by adding re-entrancy check logic. This does not explain why this is the right fix. The questions to answer are: what is causing the re-entrancy? and why is dropping the second EOI update safe? The answer to the latter could well be "because we've already processed it", but the answer to the former is more important. The re-entrancy happens because the irq state is the OR of the interrupt state and the resamplefd state. That is, we don't want to show the state as 0 until we've had a chance to set the resamplefd. But if the interrupt has _not_ gone low then we get an infinite loop. So the actual root cause is that this is a level-triggered interrupt, otherwise irqfd_inject would immediately set the KVM_USERSPACE_IRQ_SOURCE_ID high and then low and you wouldn't have the infinite loop. But in the case of level-triggered interrupts the VMEXIT already happens because TMR is set; only edge-triggered interrupts need the lazy invocation of the ack notifier. So this should be the fix: diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c index 7668fed1ce65..ca2d73cd00a3 100644 --- a/arch/x86/kvm/ioapic.c +++ b/arch/x86/kvm/ioapic.c @@ -225,12 +225,12 @@ static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq, } /* - * AMD SVM AVIC accelerate EOI write and do not trap, - * in-kernel IOAPIC will not be able to receive the EOI. + * AMD SVM AVIC accelerate EOI write iff the interrupt is level + * triggered, in-kernel IOAPIC will not be able to receive the EOI. * In this case, we do lazy update of the pending EOI when * trying to set IOAPIC irq. */ - if (kvm_apicv_activated(ioapic->kvm)) + if (edge && kvm_apicv_activated(ioapic->kvm)) ioapic_lazy_update_eoi(ioapic, irq); /* Did I miss anything in the above analysis with respect to AVIC? Paolo