Hi Marc, On 8/10/22 08:51, Marc Zyngier wrote: > On Wed, 10 Aug 2022 00:30:29 +0100, > Dmytro Maluka <dmy@xxxxxxxxxxxx> wrote: >> On 8/9/22 10:01 PM, Dong, Eddie wrote: >>> >>>> -----Original Message----- >>>> From: Dmytro Maluka <dmy@xxxxxxxxxxxx> >>>> Sent: Tuesday, August 9, 2022 12:24 AM >>>> To: Dong, Eddie <eddie.dong@xxxxxxxxx>; Christopherson,, Sean >>>> <seanjc@xxxxxxxxxx>; Paolo Bonzini <pbonzini@xxxxxxxxxx>; >>>> kvm@xxxxxxxxxxxxxxx >>>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>; >>>> Borislav Petkov <bp@xxxxxxxxx>; Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>; >>>> x86@xxxxxxxxxx; H. Peter Anvin <hpa@xxxxxxxxx>; linux- >>>> kernel@xxxxxxxxxxxxxxx; Eric Auger <eric.auger@xxxxxxxxxx>; Alex >>>> Williamson <alex.williamson@xxxxxxxxxx>; Liu, Rong L <rong.l.liu@xxxxxxxxx>; >>>> Zhenyu Wang <zhenyuw@xxxxxxxxxxxxxxx>; Tomasz Nowicki >>>> <tn@xxxxxxxxxxxx>; Grzegorz Jaszczyk <jaz@xxxxxxxxxxxx>; >>>> upstream@xxxxxxxxxxxx; Dmitry Torokhov <dtor@xxxxxxxxxx> >>>> Subject: Re: [PATCH v2 0/5] KVM: Fix oneshot interrupts forwarding >>>> >>>> On 8/9/22 1:26 AM, Dong, Eddie wrote: >>>>>> The existing KVM mechanism for forwarding of level-triggered >>>>>> interrupts using resample eventfd doesn't work quite correctly in the >>>>>> case of interrupts that are handled in a Linux guest as oneshot >>>>>> interrupts (IRQF_ONESHOT). Such an interrupt is acked to the device >>>>>> in its threaded irq handler, i.e. later than it is acked to the >>>>>> interrupt controller (EOI at the end of hardirq), not earlier. The >>>>>> existing KVM code doesn't take that into account, which results in >>>>>> erroneous extra interrupts in the guest caused by premature re-assert of an >>>> unacknowledged IRQ by the host. >>>>> Interesting... How it behaviors in native side? >>>> In native it behaves correctly, since Linux masks such a oneshot interrupt at the >>>> beginning of hardirq, so that the EOI at the end of hardirq doesn't result in its >>>> immediate re-assert, and then unmasks it later, after its threaded irq handler >>>> completes. >>>> >>>> In handle_fasteoi_irq(): >>>> >>>> if (desc->istate & IRQS_ONESHOT) >>>> mask_irq(desc); >>>> >>>> handle_irq_event(desc); >>>> >>>> cond_unmask_eoi_irq(desc, chip); >>>> >>>> >>>> and later in unmask_threaded_irq(): >>>> >>>> unmask_irq(desc); >>>> >>>> I also mentioned that in patch #3 description: >>>> "Linux keeps such interrupt masked until its threaded handler finishes, to >>>> prevent the EOI from re-asserting an unacknowledged interrupt. >>> That makes sense. Can you include the full story in cover letter too? >> Ok, I will. >> >>> >>>> However, with KVM + vfio (or whatever is listening on the resamplefd) we don't >>>> check that the interrupt is still masked in the guest at the moment of EOI. >>>> Resamplefd is notified regardless, so vfio prematurely unmasks the host >>>> physical IRQ, thus a new (unwanted) physical interrupt is generated in the host >>>> and queued for injection to the guest." > Sorry to barge in pretty late in the conversation (just been Cc'd on > this), but why shouldn't the resamplefd be notified? If there has been yeah sorry to get you involved here ;-) > an EOI, a new level must be made visible to the guest interrupt > controller, no matter what the state of the interrupt masking is. > > Whether this new level is actually *presented* to a vCPU is another > matter entirely, and is arguably a problem for the interrupt > controller emulation. FWIU on guest EOI the physical line is still asserted so the pIRQ is immediatly re-sampled by the interrupt controller (because the resamplefd unmasked the physical IRQ) and recorded as a guest IRQ (although it is masked at guest level). When the guest actually unmasks the vIRQ we do not get a chance to re-evaluate the physical line level. When running native, when EOI is sent, the physical line is still asserted but the IRQ is masked. When unmasking, the line is de-asserted. Thanks Eric > > For example on arm64, we expect to be able to read the pending state > of an interrupt from the guest irrespective of the masking state of > that interrupt. Any change to the interrupt flow should preserve this. > > Thankfully, we don't have the polarity issue (there is no such thing > in the GIC architecture) and we only deal with pending/not-pending. > > Thanks, > > M. >