Paolo,
On 5/3/20 12:19 AM, Paolo Bonzini wrote:
On 02/05/20 11:24, Suravee Suthikulpanit wrote:
....
The questions to answer are: what is causing the re-entrancy? and why
is dropping the second EOI update safe?
The answer to the latter could well be "because we've already processed
it", but the answer to the former is more important.
The re-entrancy happens because the irq state is the OR of
the interrupt state and the resamplefd state. That is, we don't
want to show the state as 0 until we've had a chance to set the
resamplefd. But if the interrupt has _not_ gone low then we get an
infinite loop.
I'm not too familiar w/ the resamplefd. I must have missed this part.
Could you please point out to me where the OR logic is?
So the actual root cause is that this is a level-triggered interrupt,
otherwise irqfd_inject would immediately set the KVM_USERSPACE_IRQ_SOURCE_ID
high and then low and you wouldn't have the infinite loop.
Okay.
But in the case of level-triggered interrupts the VMEXIT already happens because
TMR is set; only edge-triggered interrupts need the lazy invocation
of the ack notifier.
For AVIC, EOI write for level-triggered would have also be trapped.
And yes, edge-triggered needs lazy ack notifier.
So this should be the fix:
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 7668fed1ce65..ca2d73cd00a3 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -225,12 +225,12 @@ static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
}
/*
- * AMD SVM AVIC accelerate EOI write and do not trap,
- * in-kernel IOAPIC will not be able to receive the EOI.
+ * AMD SVM AVIC accelerate EOI write iff the interrupt is level
+ * triggered, in-kernel IOAPIC will not be able to receive the EOI.
Actually, it should be "AMD SVM AVIC accelerate EOI write iff the interrupt is _edge_ triggered".
* In this case, we do lazy update of the pending EOI when
* trying to set IOAPIC irq.
*/
- if (kvm_apicv_activated(ioapic->kvm))
+ if (edge && kvm_apicv_activated(ioapic->kvm))
ioapic_lazy_update_eoi(ioapic, irq);
/*
Did I miss anything in the above analysis with respect to AVIC?
For AMD:
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx>
Thanks,
Suravee