On 09/04/2014 09:56 AM, Zhang Haoyu wrote: >>> Hi Jason, >>> >> I tested below patch, it's okay, the e1000 interrupt storm disappeared. >>> >> But I am going to make a bit change on it, could you help review it? >>> >> >>>> >> >Currently, we call ioapic_service() immediately when we find the irq is still >>>> >> >active during eoi broadcast. But for real hardware, there's some dealy between >>>> >> >the EOI writing and irq delivery (system bus latency?). So we need to emulate >>>> >> >this behavior. Otherwise, for a guest who haven't register a proper irq handler >>>> >> >, it would stay in the interrupt routine as this irq would be re-injected >>>> >> >immediately after guest enables interrupt. This would lead guest can't move >>>> >> >forward and may miss the possibility to get proper irq handler registered (one >>>> >> >example is windows guest resuming from hibernation). >>>> >> > >>>> >> >As there's no way to differ the unhandled irq from new raised ones, this patch >>>> >> >solve this problems by scheduling a delayed work when the count of irq injected >>>> >> >during eoi broadcast exceeds a threshold value. After this patch, the guest can >>>> >> >move a little forward when there's no suitable irq handler in case it may >>>> >> >register one very soon and for guest who has a bad irq detection routine ( such >>>> >> >as note_interrupt() in linux ), this bad irq would be recognized soon as in the >>>> >> >past. >>>> >> > >>>> >> >Signed-off-by: Jason Wang <jasowang <at> redhat.com> >>>> >> >--- >>>> >> > virt/kvm/ioapic.c | 47 +++++++++++++++++++++++++++++++++++++++++++++-- >>>> >> > virt/kvm/ioapic.h | 2 ++ >>>> >> > 2 files changed, 47 insertions(+), 2 deletions(-) >>>> >> > >>>> >> >diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c >>>> >> >index dcaf272..892253e 100644 >>>> >> >--- a/virt/kvm/ioapic.c >>>> >> >+++ b/virt/kvm/ioapic.c >>>> >> > <at> <at> -221,6 +221,24 <at> <at> int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level) >>>> >> > return ret; >>>> >> > } >>>> >> > >>>> >> >+static void kvm_ioapic_eoi_inject_work(struct work_struct *work) >>>> >> >+{ >>>> >> >+ int i, ret; >>>> >> >+ struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic, >>>> >> >+ eoi_inject.work); >>>> >> >+ spin_lock(&ioapic->lock); >>>> >> >+ for (i = 0; i < IOAPIC_NUM_PINS; i++) { >>>> >> >+ union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i]; >>>> >> >+ >>>> >> >+ if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG) >>>> >> >+ continue; >>>> >> >+ >>>> >> >+ if (ioapic->irr & (1 << i) && !ent->fields.remote_irr) >>>> >> >+ ret = ioapic_service(ioapic, i); >>>> >> >+ } >>>> >> >+ spin_unlock(&ioapic->lock); >>>> >> >+} >>>> >> >+ >>>> >> > static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector, >>>> >> > int trigger_mode) >>>> >> > { >>>> >> > <at> <at> -249,8 +267,29 <at> <at> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector, >>>> >> > >>>> >> > ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG); >>>> >> > ent->fields.remote_irr = 0; >>>> >> >- if (!ent->fields.mask && (ioapic->irr & (1 << i))) >>>> >> >- ioapic_service(ioapic, i); >>>> >> >+ if (!ent->fields.mask && (ioapic->irr & (1 << i))) { >>>> >> >+ ++ioapic->irq_eoi; >>> >> -+ ++ioapic->irq_eoi; >>> >> ++ ++ioapic->irq_eoi[i]; >>>> >> >+ if (ioapic->irq_eoi == 100) { >>> >> -+ if (ioapic->irq_eoi == 100) { >>> >> ++ if (ioapic->irq_eoi[i] == 100) { >>>> >> >+ /* >>>> >> >+ * Real hardware does not deliver the irq so >>>> >> >+ * immediately during eoi broadcast, so we need >>>> >> >+ * to emulate this behavior. Otherwise, for >>>> >> >+ * guests who has not registered handler of a >>>> >> >+ * level irq, this irq would be injected >>>> >> >+ * immediately after guest enables interrupt >>>> >> >+ * (which happens usually at the end of the >>>> >> >+ * common interrupt routine). This would lead >>>> >> >+ * guest can't move forward and may miss the >>>> >> >+ * possibility to get proper irq handler >>>> >> >+ * registered. So we need to give some breath to >>>> >> >+ * guest. TODO: 1 is too long? >>>> >> >+ */ >>>> >> >+ schedule_delayed_work(&ioapic->eoi_inject, 1); >>>> >> >+ ioapic->irq_eoi = 0; >>> >> -+ ioapic->irq_eoi = 0; >>> >> ++ ioapic->irq_eoi[i] = 0; >>>> >> >+ } else { >>>> >> >+ ioapic_service(ioapic, i); >>>> >> >+ } >>>> >> >+ } >>> >> ++ else { >>> >> ++ ioapic->irq_eoi[i] = 0; >>> >> ++ } >>>> >> > } >>>> >> > } >>> >> I think ioapic->irq_eoi is prone to reach to 100, because during a eoi broadcast, >>> >> it's possible that another interrupt's (not current eoi's corresponding interrupt) irr is set, so the ioapic->irq_eoi will grow continually, >>> >> and not too long, ioapic->irq_eoi will reach to 100. >>> >> I want to add "u32 irq_eoi[IOAPIC_NUM_PINS];" instead of "u32 irq_eoi;". >>> >> Any ideas? >>> >> >>> >> Zhang Haoyu >> > >> >I'm a bit concerned how this will affect realtime guests. >> >Worth adding a flag to enable this, so that e.g. virtio is not >> >affected? >> > > Your concern is reasonable. > If applying Jason's original patch, sometimes the virtio's interrupt delay is more than 4ms(my host's HZ=250), > but very rarely happened. > And with my above change on it(per irq counter instead of total irq counter), the delayed virtio interrupt is more rarely happened, > then I use 1000 instead of 100 on "if (ioapic->irq_eoi[i] == 1000)", I made a test for 10min, the delayed virtio interrupt has not happened. > > Thanks, > Zhang Haoyu > I agree 100 is too aggressive here. Probably you may use a number even much higher than 1000. One more thing, may worth to add a tracepoint also if we really want this. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html