Il 27/08/2014 16:05, Wei Wang ha scritto: > Guest may mask the IOAPIC entry before issue EOI. In such case, > EOI will not be intercepted by the hypervisor, since the corresponding > bit in eoi_exit_bitmap is not set after the masking of IOAPIC entry. > > The solution here is to OR eoi_exit_bitmap with tmr to make sure that > all level-triggered interrupts have their bits in eoi_exit_bitmap set. This commit message does not explain why this change is necessary, and the relationship between this patch and the previous one. For example: ------ Commit 0f6c0a740b7d (KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table, 2014-07-30) fixed an APICv bug where an incorrect EOI exit bitmap triggered an interrupt storm inside the guest. There is a corner case for which that patch would have disabled accelerated EOI unnecessarily. Suppose you have: - a device that was the sole user of an INTx interrupt and is hot-unplugged - an OS that masks the INTx interrupt entry in the IOAPIC after the unplug - another device that uses MSI and is subsequently hot-plugged If the OS chooses to reuse the same LAPIC interrupt vector for the two interrupts, the patch would have left the vector in the EOI exit bitmap, because KVM takes into account the stale entry in the IOAPIC redirection table. We do know exactly which masked interrupts are still in-service and thus require broadcasting an EOI to the IOAPIC: this information is in the TMR. So, this patch ORs the EOI exit bitmap provided by the ioapic with the TMR register. Thanks to the previous patch, an active level-triggered interrupt will always be included in the EOI exit bitmap. ------ However, see below. > Tested-by: Rongrong Liu <rongrongx.liu@xxxxxxxxx> > Signed-off-by: Yang Zhang <yang.z.zhang@xxxxxxxxx> > Signed-off-by: Wei Wang <wei.w.wang@xxxxxxxxx> > --- > arch/x86/kvm/lapic.c | 12 ++++++++++++ > arch/x86/kvm/lapic.h | 1 + > arch/x86/kvm/x86.c | 1 + > virt/kvm/ioapic.c | 7 ++++--- > 4 files changed, 18 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 8c1162d..0fcac3c 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -539,6 +539,18 @@ void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr) > } > } > > +void kvm_apic_update_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap) > +{ > + struct kvm_lapic *apic = vcpu->arch.apic; > + u32 i; > + u32 tmr; > + > + for (i = 0; i < 8; i++) { > + tmr = kvm_apic_get_reg(apic, APIC_TMR + 0x10 * i); > + *((u32 *)eoi_exit_bitmap + i) |= tmr; > + } > +} > + > static void apic_update_ppr(struct kvm_lapic *apic) > { > u32 tpr, isrv, ppr, old_ppr; > diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h > index 6a11845..d2b96f2 100644 > --- a/arch/x86/kvm/lapic.h > +++ b/arch/x86/kvm/lapic.h > @@ -55,6 +55,7 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu); > > void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr); > void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir); > +void kvm_apic_update_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap); > int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); > int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda); > int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq, > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index d401684..d23b558 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -5992,6 +5992,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) > > kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr); > kvm_apic_update_tmr(vcpu, tmr); > + kvm_apic_update_eoi_exitmap(vcpu, eoi_exit_bitmap); > kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap); > } > > diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c > index e8ce34c..ed15936 100644 > --- a/virt/kvm/ioapic.c > +++ b/virt/kvm/ioapic.c > @@ -254,9 +254,10 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, > spin_lock(&ioapic->lock); > for (index = 0; index < IOAPIC_NUM_PINS; index++) { > e = &ioapic->redirtbl[index]; > - if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG || > - kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index) || > - index == RTC_GSI) { > + if ((!e->fields.mask > + && e->fields.trig_mode == IOAPIC_LEVEL_TRIG) > + || kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, > + index) || index == RTC_GSI) { > if (kvm_apic_match_dest(vcpu, NULL, 0, > e->fields.dest_id, e->fields.dest_mode)) { > __set_bit(e->fields.vector, > There's still something missing here. Suppose you have the following: Program edge-triggered MSI for vector 123 Interrupt comes in, ISR[123]=1 Mask MSI Program level-triggered IOAPIC interrupt for vector 123 >> Here, TMR[123] remains 0. Send EOI for vector 123 Now the TMR will not be updated in the LAPICs, and the EOI exit bitmap will not be set correctly. To fix this, we could drop all communication of the TMR from IOAPIC to LAPIC. Instead, do what the processor does and just modify the TMR in __apic_accept_irq, based on the trig_mode. If the TMR changes, you trigger an IOAPIC scan that will also refresh the EOI exit bitmap. For the hot-unplug/hot-plug case that Yang mentioned, the EOI exit bitmap will be updated as soon as the first MSI arrives (the MSI is edge-triggered). But this is more tricky than it looks like. Because the interrupt must not be delivered until the EOI exit bitmap is right, you have to skip posted interrupt delivery. There could be races between the handling of KVM_REQ_SCAN_IOAPIC and KVM_REQ_EVENT, which are hard to get right. I'm not sure it's worthwhile for what is after all a corner case. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html