Hi Eric, > -----Original Message----- > From: Eric Auger <eric.auger@xxxxxxxxxx> > Sent: Thursday, August 4, 2022 7:45 AM > To: Liu, Rong L <rong.l.liu@xxxxxxxxx>; Dmytro Maluka > <dmy@xxxxxxxxxxxx>; Christopherson,, Sean <seanjc@xxxxxxxxxx> > Cc: Micah Morton <mortonm@xxxxxxxxxxxx>; Alex Williamson > <alex.williamson@xxxxxxxxxx>; kvm@xxxxxxxxxxxxxxx; Paolo Bonzini > <pbonzini@xxxxxxxxxx>; Tomasz Nowicki <tn@xxxxxxxxxxxx>; Grzegorz > Jaszczyk <jaz@xxxxxxxxxxxx>; Dmitry Torokhov <dtor@xxxxxxxxxx> > Subject: Re: Add vfio-platform support for ONESHOT irq forwarding? > > Hi, > On 7/12/22 18:02, Liu, Rong L wrote: > > Hi Sean and Dmytro, > > > >> -----Original Message----- > >> From: Dmytro Maluka <dmy@xxxxxxxxxxxx> > >> Sent: Thursday, July 7, 2022 10:39 AM > >> To: Christopherson,, Sean <seanjc@xxxxxxxxxx> > >> Cc: Auger Eric <eric.auger@xxxxxxxxxx>; Micah Morton > >> <mortonm@xxxxxxxxxxxx>; Alex Williamson > >> <alex.williamson@xxxxxxxxxx>; kvm@xxxxxxxxxxxxxxx; Paolo Bonzini > >> <pbonzini@xxxxxxxxxx>; Liu, Rong L <rong.l.liu@xxxxxxxxx>; Tomasz > >> Nowicki <tn@xxxxxxxxxxxx>; Grzegorz Jaszczyk <jaz@xxxxxxxxxxxx>; > >> Dmitry Torokhov <dtor@xxxxxxxxxx> > >> Subject: Re: Add vfio-platform support for ONESHOT irq forwarding? > >> > >> On 7/7/22 17:00, Sean Christopherson wrote: > >>> On Thu, Jul 07, 2022, Dmytro Maluka wrote: > >>>> Hi Sean, > >>>> > >>>> On 7/6/22 10:39 PM, Sean Christopherson wrote: > >>>>> On Wed, Jul 06, 2022, Dmytro Maluka wrote: > >>>>>> This is not a problem on native, since for oneshot irq we keep the > >>>>>> interrupt masked until the thread exits, so that the EOI at the end > >>>>>> of hardirq doesn't result in immediate re-assert. In vfio + KVM > >>>>>> case, however, the host doesn't check that the interrupt is still > >>>>>> masked in the guest, so > >>>>>> vfio_platform_unmask() is called regardless. > >>>>> Isn't not checking that an interrupt is unmasked the real bug? > >>>>> Fudging around vfio (or whatever is doing the premature > unmasking) > >>>>> bugs by delaying an ack notification in KVM is a hack, no? > >>>> Yes, not checking that an interrupt is unmasked is IMO a bug, and > my > >>>> patch actually adds this missing checking, only that it adds it in > >>>> KVM, not in VFIO. :) > >>>> > >>>> Arguably it's not a bug that VFIO is not checking the guest interrupt > >>>> state on its own, provided that the resample notification it receives > >>>> is always a notification that the interrupt has been actually acked. > >>>> That is the motivation behind postponing ack notification in KVM in > >>>> my patch: it is to ensure that KVM "ack notifications" are always > >>>> actual ack notifications (as the name suggests), not just "eoi > >> notifications". > >>> But EOI is an ACK. It's software saying "this interrupt has been > >> consumed". > >> > >> Ok, I see we've had some mutual misunderstanding of the term "ack" > >> here. > >> EOI is an ACK in the interrupt controller sense, while I was talking > about > >> an ACK in the device sense, i.e. a device-specific action, done in a > device > >> driver's IRQ handler, which makes the device de-assert the IRQ line (so > >> that the IRQ will not get re-asserted after an EOI or unmask). > >> > >> So the problem I'm trying to fix stems from the peculiarity of > "oneshot" > >> interrupts: an ACK in the device sense is done in a threaded handler, > i.e. > >> after an ACK in the interrupt controller sense, not before it. > >> > >> In the meantime I've realized one more reason why my patch is wrong. > >> kvm_notify_acked_irq() is an internal KVM thing which is supposed to > >> notify interested parts of KVM about an ACK rather in the interrupt > >> controller sense, i.e. about an EOI as it is doing already. > >> > >> VFIO, on the other hand, rather expects a notification about an ACK in > the > >> device sense. So it still seems a good idea to me to postpone sending > >> notifications to VFIO until an IRQ gets unmasked, but this postponing > >> should be done not for the entire kvm_notify_acked_irq() but only for > >> eventfd_signal() on resamplefd in irqfd_resampler_ack(). > >> > >> Thanks for making me think about that. > >> > >>>> That said, your idea of checking the guest interrupt status in VFIO > >>>> (or whatever is listening on the resample eventfd) makes sense to > me > >>>> too. The problem, though, is that it's KVM that knows the guest > >>>> interrupt status, so KVM would need to let VFIO/whatever know it > >>>> somehow. (I'm assuming we are focusing on the case of KVM kernel > >>>> irqchip, not userspace or split irqchip.) So do you have in mind > >>>> adding something like "maskfd" and "unmaskfd" to KVM IRQFD > >> interface, > >>>> in addition to resamplefd? If so, I'm actually in favor of such an > >>>> idea, as I think it would be also useful for other purposes, regardless > >> of oneshot interrupts. > >>> Unless I'm misreading things, KVM already provides a mask notifier, > >>> irqfd just needs to be wired up to use > >> kvm_(un)register_irq_mask_notifier(). > >> > > Interesting... I initially thought that kvm doesn't "trap" on ioapic's > mmio > > write. However, I just traced kvm/ioapic.c and it turns out > > ioapic_write_indirect() was called many times. Does trapping on > ioapic's mmio > > write cause vmexit on edge-triggered interrupt exit? It seems the case > because > > IOREGSEL and IOWIN of IOAPIC are memory mapped but not the > redirection entry > > register for each IRQ (that is why the name indirect_write), in order to > unmask > > redirection entry register on the exit of each interrupt (edge-triggered > or > > level-triggered), kernel needs to write to IORESEL, which means vmexit > if kvm > > traps on ioapic's mmio write. However, for pass-thru device which > uses > > edge-triggered interrupt (handled by vfio or something similar), > interrupt > > (pIRQ) is enabled by vfio and it seems unnecessary to cause a vmexit > when guest > > updates virtual ioapic. I think the situation is similar for level-triggered > > interrupt. So 2 vmexits for each level-triggered interrupt completion, > one for > > EOI on lapic and another for unmask of IOAPIC register. Does this > sound right? > > I thought with vfio (or similar architecture), there is no vmexit > necessary on > > edge-triggered interrupt completion and only one vmexit for level > triggered > > interrupt completion, except the caveats of oneshot interrupt. Maybe I > > misunderstand something? > Currently, no vmexit for edge-sensitive and 1 vmexit for level-sensitive > is what happens on ARM shared peripheral interrupts at least. > Note there is one setup that could remove the need for the vmexit on > vEOI: irq_bypass mode > (https://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf slide > 12-14): > on GIC you have a mode that allows automatic completion of the > physical > IRQ when the corresponding vIRQ is completed. This mode would not be > compatible with oneshort_irq. > At some point we worked on this enablement but given the lack of > vfio-platform customers this work was paused so we still have the > mask/unmask vfio dance. > > Thanks > > Eric > Thanks for the info. I am not familiar with ARM but it is interesting to know the difference between 2 architectures. > > > >> Thanks for the tip. I'll take a look into implementing this idea. > >> > >> It seems you agree that fixing this issue via a change in KVM (in irqfd, > not > >> in ioapic) seems to be the way to go. > >> > >> An immediate problem I see with > kvm_(un)register_irq_mask_notifier() > >> is that it is currently available for x86 only. Also, mask notifiers are > called > >> under ioapic->lock (I'm not sure yet if that is good or bad news for us). > >> > >> Thanks, > >> Dmytro