Re: [PATCH v2 0/5] KVM: Fix oneshot interrupts forwarding

Eric Auger <eric.auger@xxxxxxxxxx> · Wed, 10 Aug 2022 10:12:18 +0200

Hi Marc,

On 8/10/22 08:51, Marc Zyngier wrote:
> On Wed, 10 Aug 2022 00:30:29 +0100,
> Dmytro Maluka <dmy@xxxxxxxxxxxx> wrote:
>> On 8/9/22 10:01 PM, Dong, Eddie wrote:
>>>
>>>> -----Original Message-----
>>>> From: Dmytro Maluka <dmy@xxxxxxxxxxxx>
>>>> Sent: Tuesday, August 9, 2022 12:24 AM
>>>> To: Dong, Eddie <eddie.dong@xxxxxxxxx>; Christopherson,, Sean
>>>> <seanjc@xxxxxxxxxx>; Paolo Bonzini <pbonzini@xxxxxxxxxx>;
>>>> kvm@xxxxxxxxxxxxxxx
>>>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>;
>>>> Borislav Petkov <bp@xxxxxxxxx>; Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>;
>>>> x86@xxxxxxxxxx; H. Peter Anvin <hpa@xxxxxxxxx>; linux-
>>>> kernel@xxxxxxxxxxxxxxx; Eric Auger <eric.auger@xxxxxxxxxx>; Alex
>>>> Williamson <alex.williamson@xxxxxxxxxx>; Liu, Rong L <rong.l.liu@xxxxxxxxx>;
>>>> Zhenyu Wang <zhenyuw@xxxxxxxxxxxxxxx>; Tomasz Nowicki
>>>> <tn@xxxxxxxxxxxx>; Grzegorz Jaszczyk <jaz@xxxxxxxxxxxx>;
>>>> upstream@xxxxxxxxxxxx; Dmitry Torokhov <dtor@xxxxxxxxxx>
>>>> Subject: Re: [PATCH v2 0/5] KVM: Fix oneshot interrupts forwarding
>>>>
>>>> On 8/9/22 1:26 AM, Dong, Eddie wrote:
>>>>>> The existing KVM mechanism for forwarding of level-triggered
>>>>>> interrupts using resample eventfd doesn't work quite correctly in the
>>>>>> case of interrupts that are handled in a Linux guest as oneshot
>>>>>> interrupts (IRQF_ONESHOT). Such an interrupt is acked to the device
>>>>>> in its threaded irq handler, i.e. later than it is acked to the
>>>>>> interrupt controller (EOI at the end of hardirq), not earlier. The
>>>>>> existing KVM code doesn't take that into account, which results in
>>>>>> erroneous extra interrupts in the guest caused by premature re-assert of an
>>>> unacknowledged IRQ by the host.
>>>>> Interesting...  How it behaviors in native side?
>>>> In native it behaves correctly, since Linux masks such a oneshot interrupt at the
>>>> beginning of hardirq, so that the EOI at the end of hardirq doesn't result in its
>>>> immediate re-assert, and then unmasks it later, after its threaded irq handler
>>>> completes.
>>>>
>>>> In handle_fasteoi_irq():
>>>>
>>>> 	if (desc->istate & IRQS_ONESHOT)
>>>> 		mask_irq(desc);
>>>>
>>>> 	handle_irq_event(desc);
>>>>
>>>> 	cond_unmask_eoi_irq(desc, chip);
>>>>
>>>>
>>>> and later in unmask_threaded_irq():
>>>>
>>>> 	unmask_irq(desc);
>>>>
>>>> I also mentioned that in patch #3 description:
>>>> "Linux keeps such interrupt masked until its threaded handler finishes, to
>>>> prevent the EOI from re-asserting an unacknowledged interrupt.
>>> That makes sense. Can you include the full story in cover letter too?
>> Ok, I will.
>>
>>>
>>>> However, with KVM + vfio (or whatever is listening on the resamplefd) we don't
>>>> check that the interrupt is still masked in the guest at the moment of EOI.
>>>> Resamplefd is notified regardless, so vfio prematurely unmasks the host
>>>> physical IRQ, thus a new (unwanted) physical interrupt is generated in the host
>>>> and queued for injection to the guest."
> Sorry to barge in pretty late in the conversation (just been Cc'd on
> this), but why shouldn't the resamplefd be notified? If there has been
yeah sorry to get you involved here ;-)
> an EOI, a new level must be made visible to the guest interrupt
> controller, no matter what the state of the interrupt masking is.
>
> Whether this new level is actually *presented* to a vCPU is another
> matter entirely, and is arguably a problem for the interrupt
> controller emulation.

FWIU on guest EOI the physical line is still asserted so the pIRQ is
immediatly re-sampled by the interrupt controller (because the
resamplefd unmasked the physical IRQ) and recorded as a guest IRQ
(although it is masked at guest level). When the guest actually unmasks
the vIRQ we do not get a chance to re-evaluate the physical line level.

When running native, when EOI is sent, the physical line is still
asserted but the IRQ is masked. When unmasking, the line is de-asserted.

Thanks

Eric
>
> For example on arm64, we expect to be able to read the pending state
> of an interrupt from the guest irrespective of the masking state of
> that interrupt. Any change to the interrupt flow should preserve this.
>
> Thankfully, we don't have the polarity issue (there is no such thing
> in the GIC architecture) and we only deal with pending/not-pending.
>
> Thanks,
>
> 	M.
>