Re: [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofitscorrespondingioapic->irr bit always set

"Zhang Haoyu" <zhanghy@xxxxxxxxxxx> · Fri, 29 Aug 2014 11:14:36 +0800

Hi, Yang, Gleb, Michael,
Could you help review below patch please?

Thanks,
Zhang Haoyu

>> Hi Jason,
>> I tested below patch, it's okay, the e1000 interrupt storm disappeared.
>> But I am going to make a bit change on it, could you help review it?
>>
>>> Currently, we call ioapic_service() immediately when we find the irq is still
>>> active during eoi broadcast. But for real hardware, there's some dealy between
>>> the EOI writing and irq delivery (system bus latency?). So we need to emulate
>>> this behavior. Otherwise, for a guest who haven't register a proper irq handler
>>> , it would stay in the interrupt routine as this irq would be re-injected
>>> immediately after guest enables interrupt. This would lead guest can't move
>>> forward and may miss the possibility to get proper irq handler registered (one
>>> example is windows guest resuming from hibernation).
>>>
>>> As there's no way to differ the unhandled irq from new raised ones, this patch
>>> solve this problems by scheduling a delayed work when the count of irq injected
>>> during eoi broadcast exceeds a threshold value. After this patch, the guest can
>>> move a little forward when there's no suitable irq handler in case it may
>>> register one very soon and for guest who has a bad irq detection routine ( such
>>> as note_interrupt() in linux ), this bad irq would be recognized soon as in the
>>> past.
>>>
>>> Signed-off-by: Jason Wang <jasowang <at> redhat.com>
>>> ---
>>> virt/kvm/ioapic.c |   47 +++++++++++++++++++++++++++++++++++++++++++++--
>>> virt/kvm/ioapic.h |    2 ++
>>> 2 files changed, 47 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
>>> index dcaf272..892253e 100644
>>> --- a/virt/kvm/ioapic.c
>>> +++ b/virt/kvm/ioapic.c
>>> <at>  <at>  -221,6 +221,24  <at>  <at>  int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level)
>>> 	return ret;
>>> }
>>>
>>> +static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
>>> +{
>>> +	int i, ret;
>>> +	struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic,
>>> +						 eoi_inject.work);
>>> +	spin_lock(&ioapic->lock);
>>> +	for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>>> +		union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
>>> +
>>> +		if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG)
>>> +			continue;
>>> +
>>> +		if (ioapic->irr & (1 << i) && !ent->fields.remote_irr)
>>> +			ret = ioapic_service(ioapic, i);
>>> +	}
>>> +	spin_unlock(&ioapic->lock);
>>> +}
>>> +
>>> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
>>> 				     int trigger_mode)
>>> {
>>> <at>  <at>  -249,8 +267,29  <at>  <at>  static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector,
>>>
>>> 		ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
>>> 		ent->fields.remote_irr = 0;
>>> -		if (!ent->fields.mask && (ioapic->irr & (1 << i)))
>>> -			ioapic_service(ioapic, i);
>>> +		if (!ent->fields.mask && (ioapic->irr & (1 << i))) {
>>> +			++ioapic->irq_eoi;
>> -+			++ioapic->irq_eoi;
>> ++		    ++ioapic->irq_eoi[i];
>>> +			if (ioapic->irq_eoi == 100) {
>> -+			if (ioapic->irq_eoi == 100) {
>> ++			if (ioapic->irq_eoi[i] == 100) {
>>> +				/*
>>> +				 * Real hardware does not deliver the irq so
>>> +				 * immediately during eoi broadcast, so we need
>>> +				 * to emulate this behavior. Otherwise, for
>>> +				 * guests who has not registered handler of a
>>> +				 * level irq, this irq would be injected
>>> +				 * immediately after guest enables interrupt
>>> +				 * (which happens usually at the end of the
>>> +				 * common interrupt routine). This would lead
>>> +				 * guest can't move forward and may miss the
>>> +				 * possibility to get proper irq handler
>>> +				 * registered. So we need to give some breath to
>>> +				 * guest. TODO: 1 is too long?
>>> +				 */
>>> +				schedule_delayed_work(&ioapic->eoi_inject, 1);
>>> +				ioapic->irq_eoi = 0;
>> -+				ioapic->irq_eoi = 0;
>> ++				ioapic->irq_eoi[i] = 0;
>>> +			} else {
>>> +				ioapic_service(ioapic, i);
>>> +			}
>>> +		}
>> ++		else {
>> ++			ioapic->irq_eoi[i] = 0;
>> ++		}
>>> 	}
>>> }
>> I think ioapic->irq_eoi is prone to reach to 100, because during a eoi broadcast, 
>> it's possible that another interrupt's (not current eoi's corresponding interrupt) irr is set, so the ioapic->irq_eoi will grow continually,
>> and not too long, ioapic->irq_eoi will reach to 100.
>> I want to add "u32 irq_eoi[IOAPIC_NUM_PINS];" instead of "u32 irq_eoi;".
>> Any ideas?
>>
>> Zhang Haoyu
>
>I don't have objection to this per irq counter, but I don't see obvious
>difference.
>
>Btw, the patch has been rejected in the past since we're not sure
>whether this behaviour is architectural (or ask Intel?). You need
>convince the Gleb and other kvm maintainers for this idea first :).

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html