On Wed, 10 Aug 2022 09:12:18 +0100, Eric Auger <eric.auger@xxxxxxxxxx> wrote: > > Hi Marc, > > On 8/10/22 08:51, Marc Zyngier wrote: > > On Wed, 10 Aug 2022 00:30:29 +0100, > > Dmytro Maluka <dmy@xxxxxxxxxxxx> wrote: > >> On 8/9/22 10:01 PM, Dong, Eddie wrote: > >>> > >>>> -----Original Message----- > >>>> From: Dmytro Maluka <dmy@xxxxxxxxxxxx> > >>>> Sent: Tuesday, August 9, 2022 12:24 AM > >>>> To: Dong, Eddie <eddie.dong@xxxxxxxxx>; Christopherson,, Sean > >>>> <seanjc@xxxxxxxxxx>; Paolo Bonzini <pbonzini@xxxxxxxxxx>; > >>>> kvm@xxxxxxxxxxxxxxx > >>>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>; > >>>> Borislav Petkov <bp@xxxxxxxxx>; Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>; > >>>> x86@xxxxxxxxxx; H. Peter Anvin <hpa@xxxxxxxxx>; linux- > >>>> kernel@xxxxxxxxxxxxxxx; Eric Auger <eric.auger@xxxxxxxxxx>; Alex > >>>> Williamson <alex.williamson@xxxxxxxxxx>; Liu, Rong L <rong.l.liu@xxxxxxxxx>; > >>>> Zhenyu Wang <zhenyuw@xxxxxxxxxxxxxxx>; Tomasz Nowicki > >>>> <tn@xxxxxxxxxxxx>; Grzegorz Jaszczyk <jaz@xxxxxxxxxxxx>; > >>>> upstream@xxxxxxxxxxxx; Dmitry Torokhov <dtor@xxxxxxxxxx> > >>>> Subject: Re: [PATCH v2 0/5] KVM: Fix oneshot interrupts forwarding > >>>> > >>>> On 8/9/22 1:26 AM, Dong, Eddie wrote: > >>>>>> The existing KVM mechanism for forwarding of level-triggered > >>>>>> interrupts using resample eventfd doesn't work quite correctly in the > >>>>>> case of interrupts that are handled in a Linux guest as oneshot > >>>>>> interrupts (IRQF_ONESHOT). Such an interrupt is acked to the device > >>>>>> in its threaded irq handler, i.e. later than it is acked to the > >>>>>> interrupt controller (EOI at the end of hardirq), not earlier. The > >>>>>> existing KVM code doesn't take that into account, which results in > >>>>>> erroneous extra interrupts in the guest caused by premature re-assert of an > >>>> unacknowledged IRQ by the host. > >>>>> Interesting... How it behaviors in native side? > >>>> In native it behaves correctly, since Linux masks such a oneshot interrupt at the > >>>> beginning of hardirq, so that the EOI at the end of hardirq doesn't result in its > >>>> immediate re-assert, and then unmasks it later, after its threaded irq handler > >>>> completes. > >>>> > >>>> In handle_fasteoi_irq(): > >>>> > >>>> if (desc->istate & IRQS_ONESHOT) > >>>> mask_irq(desc); > >>>> > >>>> handle_irq_event(desc); > >>>> > >>>> cond_unmask_eoi_irq(desc, chip); > >>>> > >>>> > >>>> and later in unmask_threaded_irq(): > >>>> > >>>> unmask_irq(desc); > >>>> > >>>> I also mentioned that in patch #3 description: > >>>> "Linux keeps such interrupt masked until its threaded handler finishes, to > >>>> prevent the EOI from re-asserting an unacknowledged interrupt. > >>> That makes sense. Can you include the full story in cover letter too? > >> Ok, I will. > >> > >>> > >>>> However, with KVM + vfio (or whatever is listening on the resamplefd) we don't > >>>> check that the interrupt is still masked in the guest at the moment of EOI. > >>>> Resamplefd is notified regardless, so vfio prematurely unmasks the host > >>>> physical IRQ, thus a new (unwanted) physical interrupt is generated in the host > >>>> and queued for injection to the guest." > > Sorry to barge in pretty late in the conversation (just been Cc'd on > > this), but why shouldn't the resamplefd be notified? If there has been > yeah sorry to get you involved here ;-) No problem! > > an EOI, a new level must be made visible to the guest interrupt > > controller, no matter what the state of the interrupt masking is. > > > > Whether this new level is actually *presented* to a vCPU is another > > matter entirely, and is arguably a problem for the interrupt > > controller emulation. > > FWIU on guest EOI the physical line is still asserted so the pIRQ is > immediatly re-sampled by the interrupt controller (because the > resamplefd unmasked the physical IRQ) and recorded as a guest IRQ > (although it is masked at guest level). When the guest actually unmasks > the vIRQ we do not get a chance to re-evaluate the physical line level. Indeed, and maybe this is what should be fixed instead of moving the resampling point around (I was suggesting something along these lines in [1]). We already do this on arm64 for the timer, and it should be easy enough it generalise to any interrupt backed by the GIC (there is an in-kernel API to sample the pending state). No idea how that translate for other architectures though. M. [1] https://lore.kernel.org/r/87mtccbie4.wl-maz@xxxxxxxxxx -- Without deviation from the norm, progress is not possible.