Hi Jason, I tested below patch, it's okay, the e1000 interrupt storm disappeared. But I am going to make a bit change on it, could you help review it? >Currently, we call ioapic_service() immediately when we find the irq is still >active during eoi broadcast. But for real hardware, there's some dealy between >the EOI writing and irq delivery (system bus latency?). So we need to emulate >this behavior. Otherwise, for a guest who haven't register a proper irq handler >, it would stay in the interrupt routine as this irq would be re-injected >immediately after guest enables interrupt. This would lead guest can't move >forward and may miss the possibility to get proper irq handler registered (one >example is windows guest resuming from hibernation). > >As there's no way to differ the unhandled irq from new raised ones, this patch >solve this problems by scheduling a delayed work when the count of irq injected >during eoi broadcast exceeds a threshold value. After this patch, the guest can >move a little forward when there's no suitable irq handler in case it may >register one very soon and for guest who has a bad irq detection routine ( such >as note_interrupt() in linux ), this bad irq would be recognized soon as in the >past. > >Signed-off-by: Jason Wang <jasowang <at> redhat.com> >--- > virt/kvm/ioapic.c | 47 +++++++++++++++++++++++++++++++++++++++++++++-- > virt/kvm/ioapic.h | 2 ++ > 2 files changed, 47 insertions(+), 2 deletions(-) > >diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c >index dcaf272..892253e 100644 >--- a/virt/kvm/ioapic.c >+++ b/virt/kvm/ioapic.c > <at> <at> -221,6 +221,24 <at> <at> int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level) > return ret; > } > >+static void kvm_ioapic_eoi_inject_work(struct work_struct *work) >+{ >+ int i, ret; >+ struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic, >+ eoi_inject.work); >+ spin_lock(&ioapic->lock); >+ for (i = 0; i < IOAPIC_NUM_PINS; i++) { >+ union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i]; >+ >+ if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG) >+ continue; >+ >+ if (ioapic->irr & (1 << i) && !ent->fields.remote_irr) >+ ret = ioapic_service(ioapic, i); >+ } >+ spin_unlock(&ioapic->lock); >+} >+ > static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector, > int trigger_mode) > { > <at> <at> -249,8 +267,29 <at> <at> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector, > > ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG); > ent->fields.remote_irr = 0; >- if (!ent->fields.mask && (ioapic->irr & (1 << i))) >- ioapic_service(ioapic, i); >+ if (!ent->fields.mask && (ioapic->irr & (1 << i))) { >+ ++ioapic->irq_eoi; -+ ++ioapic->irq_eoi; ++ ++ioapic->irq_eoi[i]; >+ if (ioapic->irq_eoi == 100) { -+ if (ioapic->irq_eoi == 100) { ++ if (ioapic->irq_eoi[i] == 100) { >+ /* >+ * Real hardware does not deliver the irq so >+ * immediately during eoi broadcast, so we need >+ * to emulate this behavior. Otherwise, for >+ * guests who has not registered handler of a >+ * level irq, this irq would be injected >+ * immediately after guest enables interrupt >+ * (which happens usually at the end of the >+ * common interrupt routine). This would lead >+ * guest can't move forward and may miss the >+ * possibility to get proper irq handler >+ * registered. So we need to give some breath to >+ * guest. TODO: 1 is too long? >+ */ >+ schedule_delayed_work(&ioapic->eoi_inject, 1); >+ ioapic->irq_eoi = 0; -+ ioapic->irq_eoi = 0; ++ ioapic->irq_eoi[i] = 0; >+ } else { >+ ioapic_service(ioapic, i); >+ } >+ } ++ else { ++ ioapic->irq_eoi[i] = 0; ++ } > } > } I think ioapic->irq_eoi is prone to reach to 100, because during a eoi broadcast, it's possible that another interrupt's (not current eoi's corresponding interrupt) irr is set, so the ioapic->irq_eoi will grow continually, and not too long, ioapic->irq_eoi will reach to 100. I want to add "u32 irq_eoi[IOAPIC_NUM_PINS];" instead of "u32 irq_eoi;". Any ideas? Zhang Haoyu > > <at> <at> -375,12 +414,14 <at> <at> void kvm_ioapic_reset(struct kvm_ioapic *ioapic) > { > int i; > >+ cancel_delayed_work_sync(&ioapic->eoi_inject); > for (i = 0; i < IOAPIC_NUM_PINS; i++) > ioapic->redirtbl[i].fields.mask = 1; > ioapic->base_address = IOAPIC_DEFAULT_BASE_ADDRESS; > ioapic->ioregsel = 0; > ioapic->irr = 0; > ioapic->id = 0; >+ ioapic->irq_eoi = 0; -+ ioapic->irq_eoi = 0; ++ memset(ioapic->irq_eoi, 0x00, IOAPIC_NUM_PINS); > update_handled_vectors(ioapic); > } > > <at> <at> -398,6 +439,7 <at> <at> int kvm_ioapic_init(struct kvm *kvm) > if (!ioapic) > return -ENOMEM; > spin_lock_init(&ioapic->lock); >+ INIT_DELAYED_WORK(&ioapic->eoi_inject, kvm_ioapic_eoi_inject_work); > kvm->arch.vioapic = ioapic; > kvm_ioapic_reset(ioapic); > kvm_iodevice_init(&ioapic->dev, &ioapic_mmio_ops); > <at> <at> -418,6 +460,7 <at> <at> void kvm_ioapic_destroy(struct kvm *kvm) > { > struct kvm_ioapic *ioapic = kvm->arch.vioapic; > >+ cancel_delayed_work_sync(&ioapic->eoi_inject); > if (ioapic) { > kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS, &ioapic->dev); > kvm->arch.vioapic = NULL; >diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h >index 0b190c3..8938e66 100644 >--- a/virt/kvm/ioapic.h >+++ b/virt/kvm/ioapic.h > <at> <at> -47,6 +47,8 <at> <at> struct kvm_ioapic { > void (*ack_notifier)(void *opaque, int irq); > spinlock_t lock; > DECLARE_BITMAP(handled_vectors, 256); >+ struct delayed_work eoi_inject; >+ u32 irq_eoi; -+ u32 irq_eoi; ++ u32 irq_eoi[IOAPIC_NUM_PINS]; > }; > > #ifdef DEBUG -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html