RE: [PATCH v2] KVM: x86: keep eoi exit bitmap accurate before loading it.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Paolo Bonzini wrote on 2014-08-27:
> Il 27/08/2014 16:05, Wei Wang ha scritto:
> > Guest may mask the IOAPIC entry before issue EOI. In such case, EOI
> > will not be intercepted by the hypervisor, since the corresponding bit
> > in eoi_exit_bitmap is not set after the masking of IOAPIC entry.
> >
> > The solution here is to OR eoi_exit_bitmap with tmr to make sure that
> > all level-triggered interrupts have their bits in eoi_exit_bitmap set.
> 
> This commit message does not explain why this change is necessary, and the
> relationship between this patch and the previous one.
> 
> For example:
> 
> ------
> Commit 0f6c0a740b7d (KVM: x86: always exit on EOIs for interrupts listed in
> the IOAPIC redir table, 2014-07-30) fixed an APICv bug where an incorrect EOI
> exit bitmap triggered an interrupt storm inside the guest.
> 
> There is a corner case for which that patch would have disabled accelerated
> EOI unnecessarily.  Suppose you have:
> 
> - a device that was the sole user of an INTx interrupt and is hot-unplugged
> 
> - an OS that masks the INTx interrupt entry in the IOAPIC after the unplug
> 
> - another device that uses MSI and is subsequently hot-plugged
> 
> If the OS chooses to reuse the same LAPIC interrupt vector for the two
> interrupts, the patch would have left the vector in the EOI exit bitmap, because
> KVM takes into account the stale entry in the IOAPIC redirection table.
> 
> We do know exactly which masked interrupts are still in-service and thus
> require broadcasting an EOI to the IOAPIC: this information is in the TMR.  So,
> this patch ORs the EOI exit bitmap provided by the ioapic with the TMR register.
> Thanks to the previous patch, an active level-triggered interrupt will always be
> included in the EOI exit bitmap.
> ------
> 
> However, see below.
> 
> > Tested-by: Rongrong Liu <rongrongx.liu@xxxxxxxxx>
> > Signed-off-by: Yang Zhang <yang.z.zhang@xxxxxxxxx>
> > Signed-off-by: Wei Wang <wei.w.wang@xxxxxxxxx>
> > ---
> >  arch/x86/kvm/lapic.c |   12 ++++++++++++
> >  arch/x86/kvm/lapic.h |    1 +
> >  arch/x86/kvm/x86.c   |    1 +
> >  virt/kvm/ioapic.c    |    7 ++++---
> >  4 files changed, 18 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index
> > 8c1162d..0fcac3c 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -539,6 +539,18 @@ void kvm_apic_update_tmr(struct kvm_vcpu *vcpu,
> u32 *tmr)
> >  	}
> >  }
> >
> > +void kvm_apic_update_eoi_exitmap(struct kvm_vcpu *vcpu, u64
> > +*eoi_exit_bitmap) {
> > +	struct kvm_lapic *apic = vcpu->arch.apic;
> > +	u32 i;
> > +	u32 tmr;
> > +
> > +	for (i = 0; i < 8; i++) {
> > +		tmr = kvm_apic_get_reg(apic, APIC_TMR + 0x10 * i);
> > +		*((u32 *)eoi_exit_bitmap + i) |= tmr;
> > +	}
> > +}
> > +
> >  static void apic_update_ppr(struct kvm_lapic *apic)  {
> >  	u32 tpr, isrv, ppr, old_ppr;
> > diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index
> > 6a11845..d2b96f2 100644
> > --- a/arch/x86/kvm/lapic.h
> > +++ b/arch/x86/kvm/lapic.h
> > @@ -55,6 +55,7 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu);
> >
> >  void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr);  void
> > kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
> > +void kvm_apic_update_eoi_exitmap(struct kvm_vcpu *vcpu, u64
> > +*eoi_exit_bitmap);
> >  int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
> > int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda);  int
> > kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
> > d401684..d23b558 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5992,6 +5992,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu
> > *vcpu)
> >
> >  	kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr);
> >  	kvm_apic_update_tmr(vcpu, tmr);
> > +	kvm_apic_update_eoi_exitmap(vcpu, eoi_exit_bitmap);
> >  	kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap);  }
> >
> > diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index
> > e8ce34c..ed15936 100644
> > --- a/virt/kvm/ioapic.c
> > +++ b/virt/kvm/ioapic.c
> > @@ -254,9 +254,10 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
> u64 *eoi_exit_bitmap,
> >  	spin_lock(&ioapic->lock);
> >  	for (index = 0; index < IOAPIC_NUM_PINS; index++) {
> >  		e = &ioapic->redirtbl[index];
> > -		if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG ||
> > -		    kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index)
> ||
> > -		    index == RTC_GSI) {
> > +		if ((!e->fields.mask
> > +			&& e->fields.trig_mode == IOAPIC_LEVEL_TRIG)
> > +			|| kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC,
> > +				index) || index == RTC_GSI) {
> >  			if (kvm_apic_match_dest(vcpu, NULL, 0,
> >  				e->fields.dest_id, e->fields.dest_mode)) {
> >  				__set_bit(e->fields.vector,
> >
> 
> There's still something missing here.
> 
> Suppose you have the following:
> 
>     Program edge-triggered MSI for vector 123
>     Interrupt comes in, ISR[123]=1
>     Mask MSI
>     Program level-triggered IOAPIC interrupt for vector 123

You cannot assign the vector 123 to another trigger mode interrupt before previous IRQ handler finished (means issue EOI). If there is an interrupt comes from the IOAPIC entry immediately after you reprogram the entry, it will update the TMR to 1. Since we are still in previous IRQ handler, it will get confused to see the TMR becomes 1.

>          >> Here, TMR[123] remains 0.
>     Send EOI for vector 123
> 
> Now the TMR will not be updated in the LAPICs, and the EOI exit bitmap will
> not be set correctly.
> 
> To fix this, we could drop all communication of the TMR from IOAPIC to LAPIC.
> Instead, do what the processor does and just modify the TMR in
> __apic_accept_irq, based on the trig_mode.  If the TMR changes, you trigger
> an IOAPIC scan that will also refresh the EOI exit bitmap.
> 
> For the hot-unplug/hot-plug case that Yang mentioned, the EOI exit bitmap will
> be updated as soon as the first MSI arrives (the MSI is edge-triggered).
> 
> But this is more tricky than it looks like.  Because the interrupt must not be
> delivered until the EOI exit bitmap is right, you have to skip posted interrupt
> delivery.  There could be races between the handling of
> KVM_REQ_SCAN_IOAPIC and KVM_REQ_EVENT, which are hard to get right.
> I'm not sure it's worthwhile for what is after all a corner case.
> 
> Paolo

Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux