Hi, Yang, Gleb, Michael, Could you help review below patch please? Thanks, Zhang Haoyu >> Hi Jason, >> I tested below patch, it's okay, the e1000 interrupt storm disappeared. >> But I am going to make a bit change on it, could you help review it? >> >>> Currently, we call ioapic_service() immediately when we find the irq is still >>> active during eoi broadcast. But for real hardware, there's some dealy between >>> the EOI writing and irq delivery (system bus latency?). So we need to emulate >>> this behavior. Otherwise, for a guest who haven't register a proper irq handler >>> , it would stay in the interrupt routine as this irq would be re-injected >>> immediately after guest enables interrupt. This would lead guest can't move >>> forward and may miss the possibility to get proper irq handler registered (one >>> example is windows guest resuming from hibernation). >>> >>> As there's no way to differ the unhandled irq from new raised ones, this patch >>> solve this problems by scheduling a delayed work when the count of irq injected >>> during eoi broadcast exceeds a threshold value. After this patch, the guest can >>> move a little forward when there's no suitable irq handler in case it may >>> register one very soon and for guest who has a bad irq detection routine ( such >>> as note_interrupt() in linux ), this bad irq would be recognized soon as in the >>> past. >>> >>> Signed-off-by: Jason Wang <jasowang <at> redhat.com> >>> --- >>> virt/kvm/ioapic.c | 47 +++++++++++++++++++++++++++++++++++++++++++++-- >>> virt/kvm/ioapic.h | 2 ++ >>> 2 files changed, 47 insertions(+), 2 deletions(-) >>> >>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c >>> index dcaf272..892253e 100644 >>> --- a/virt/kvm/ioapic.c >>> +++ b/virt/kvm/ioapic.c >>> <at> <at> -221,6 +221,24 <at> <at> int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level) >>> return ret; >>> } >>> >>> +static void kvm_ioapic_eoi_inject_work(struct work_struct *work) >>> +{ >>> + int i, ret; >>> + struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic, >>> + eoi_inject.work); >>> + spin_lock(&ioapic->lock); >>> + for (i = 0; i < IOAPIC_NUM_PINS; i++) { >>> + union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i]; >>> + >>> + if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG) >>> + continue; >>> + >>> + if (ioapic->irr & (1 << i) && !ent->fields.remote_irr) >>> + ret = ioapic_service(ioapic, i); >>> + } >>> + spin_unlock(&ioapic->lock); >>> +} >>> + >>> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector, >>> int trigger_mode) >>> { >>> <at> <at> -249,8 +267,29 <at> <at> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector, >>> >>> ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG); >>> ent->fields.remote_irr = 0; >>> - if (!ent->fields.mask && (ioapic->irr & (1 << i))) >>> - ioapic_service(ioapic, i); >>> + if (!ent->fields.mask && (ioapic->irr & (1 << i))) { >>> + ++ioapic->irq_eoi; >> -+ ++ioapic->irq_eoi; >> ++ ++ioapic->irq_eoi[i]; >>> + if (ioapic->irq_eoi == 100) { >> -+ if (ioapic->irq_eoi == 100) { >> ++ if (ioapic->irq_eoi[i] == 100) { >>> + /* >>> + * Real hardware does not deliver the irq so >>> + * immediately during eoi broadcast, so we need >>> + * to emulate this behavior. Otherwise, for >>> + * guests who has not registered handler of a >>> + * level irq, this irq would be injected >>> + * immediately after guest enables interrupt >>> + * (which happens usually at the end of the >>> + * common interrupt routine). This would lead >>> + * guest can't move forward and may miss the >>> + * possibility to get proper irq handler >>> + * registered. So we need to give some breath to >>> + * guest. TODO: 1 is too long? >>> + */ >>> + schedule_delayed_work(&ioapic->eoi_inject, 1); >>> + ioapic->irq_eoi = 0; >> -+ ioapic->irq_eoi = 0; >> ++ ioapic->irq_eoi[i] = 0; >>> + } else { >>> + ioapic_service(ioapic, i); >>> + } >>> + } >> ++ else { >> ++ ioapic->irq_eoi[i] = 0; >> ++ } >>> } >>> } >> I think ioapic->irq_eoi is prone to reach to 100, because during a eoi broadcast, >> it's possible that another interrupt's (not current eoi's corresponding interrupt) irr is set, so the ioapic->irq_eoi will grow continually, >> and not too long, ioapic->irq_eoi will reach to 100. >> I want to add "u32 irq_eoi[IOAPIC_NUM_PINS];" instead of "u32 irq_eoi;". >> Any ideas? >> >> Zhang Haoyu > >I don't have objection to this per irq counter, but I don't see obvious >difference. > >Btw, the patch has been rejected in the past since we're not sure >whether this behaviour is architectural (or ask Intel?). You need >convince the Gleb and other kvm maintainers for this idea first :). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html