Hi Micah, On 1/25/21 7:32 PM, Micah Morton wrote: > On Mon, Jan 25, 2021 at 12:31 PM Auger Eric <eric.auger@xxxxxxxxxx> wrote: >> >> Hi Micah, >> >> On 1/25/21 4:46 PM, Micah Morton wrote: >>> Hi Eric, >>> >>> I was recently looking into some vfio-platform passthrough stuff and >>> came across a device I wanted to assign to a guest that uses a ONESHOT >>> type interrupt (these type of interrupts seem to be quite common, on >>> ARM at least). The semantics for ONESHOT interrupts are a bit >>> different from regular level triggered interrupts as I'll describe >>> here: >>> >>> The normal generic code flow for level-triggered interrupts is as follows: >>> >>> - regular type[1]: mask[2] the irq, then run the handler, then >>> unmask[3] the irq and done >> >> VFIO level sensitive interrupts are "automasked". See slide 10 of >> https://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf if this can help. > > When you say "automasked" / "mask and deactivate" are you referring to > the disable_irq_nosync() call in vfio_platform > (https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L154)? Yes that's what I meant > >> >> When the guest deactivates the virtual IRQ, this causes a maintenance >> interrupt on host. This occurence causes kvm_notify_acked_irq() to be >> called and this latter unmasks the physical IRQ. > > Are you saying kvm_notify_acked_irq() causes > vfio_platform_unmask_handler() > (https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L94) > to be called? Makes sense if so. Yes this ends up calling irqfd_resampler_ack which toggles the virtual irq line down and calls the unmask eventfd > >>> >>> - fasteoi type[4]: run the handler, then eoi[5] the irq and done >>> >>> Note: IIUC the fasteoi type doesn't do any irq masking/unmasking >>> because that is assumed to be handled transparently by "modern forms >>> of interrupt handlers, which handle the flow details in hardware" >>> >>> ONESHOT type interrupts are a special case of the fasteoi type >>> described above. They rely on the driver registering a threaded >>> handler for the interrupt and assume the irq line will remain masked >>> until the threaded handler completes, at which time the line will be >>> unmasked. TL;DR: >>> >>> - mask[6] the irq, run the handler, and potentially eoi[7] the irq, >>> then unmask[8] later when the threaded handler has finished running. >> >> could you point me to the exact linux handler (is it >> handle_fasteoi_irq?) or Why do you say "potentially". Given the details >> above, the guest EOI would unmask the IRQ at physical level. We do not >> have any hook KVM/VFIO on the guest unmap. > > Yep. handle_fasteoi_irq(). By "potentially" I just meant the eoi is > only called for ONESHOT irqs if this check passes: !(chip->flags & > IRQCHIP_EOI_THREADED). Not actually sure what this check is about > since I didn't look into it. I checked IRQCHIP_EOI_THREADED and it does not seem to be supported by GIC irqchip. so this would mean that the EOI is not called in our case? > > So far I was just talking about irq operations that happen in the > host. Not sure I quite understand your last two sentences about guest > EOI and guest unmap. sorry I meant unmask instead of unmap :-( The closure of the physical IRQ is triggered by the deactivation of the guest virtual IRQ. Let's first assume you keep the existing "auto-masked" level sensitive IRQ on host side (vfio platform driver), on the guest driver side you are going to have a oneshot IRQ. What needs to be understood is when does the deactivation happen on guest side. > >>> >>> For vfio-platform irq forwarding, there is no existing function in >>> drivers/vfio/platform/vfio_platform_irq.c[9] that is a good candidate >>> for registering as the threaded handler for a ONESHOT interrupt in the >>> case we want to request the ONESHOT irq with >>> request_threaded_irq()[10]. Moreover, we can't just register a >>> threaded function that simply immediately returns IRQ_HANDLED (as is >>> done in vfio_irq_handler()[11] and vfio_automasked_irq_handler()[12]), >>> since that would cause the IRQ to be unmasked[13] immediately, before >> sorry I know you reworked that several times for style issue but [13] >> does not match unmask(). > > That's actually the link I intended, irq_finalize_oneshot() will be > called after the threaded interrupt handler returns, and that's when > the IRQ will be unmasked. Hum this points to something within irq_thread_check_affinity(). Anyway. > >>> the userspace/guest driver has had any chance to service the >>> interrupt. >>> >>> The most obvious way I see to handle this is to add a threaded handler >>> to vfio_platform_irq.c that waits until the userspace/guest driver has >>> serviced the interrupt and the unmask_handler[14] has been called, at >>> which point it returns IRQ_HANDLED so the generic IRQ code in the host >>> can finally unmask the interrupt. >> this is done on guest EOI at the moment. > > By "this", I think you mean enable_irq() in vfio_platform_irq.c? right > >>> >>> Does this sound like a reasonable approach and something you would be >>> fine with adding to vfio-platform? >> Well I think it is interesting to do a pre-study and make sure we agree >> on the terminology, IRQ flow and problems that would need to be solved. >> Then we need to determine if it is worth the candle, if this support >> would speed up the vfio-platform usage (I am not sure at this point as I >> think there are more important drags like the lack of specification/dt >> integration for instance). >> >> Besides we need to make sure we are not launching into a huge effort for >> attempting to assign a device that does not fit well into the vfio >> platform scope (dma capable, reset, device tree). > > Yes, agreed. > >> >> If so I could get started looking >>> at the implementation for how to sleep in the threaded handler in >>> vfio-platform until the unmask_handler is called. The most tricky/ugly >>> part of this is that DT has no knowledge of irq ONESHOT-ness, as it >>> only contains info regarding active-low vs active-high and edge vs >>> level trigger. That means that vfio-platform can't figure out that a >>> device uses a ONESHOT irq in a similar way to how it queries[15] the >>> trigger type, and by extension QEMU can't learn this information >>> through the VFIO_DEVICE_GET_IRQ_INFO ioctl, but must have another way >>> of knowing (i.e. command line option to QEMU). >> Indeed that's not really appealing. >>> >>> I guess potentially another option would be to treat ONESHOT >>> interrupts like regular level triggered interrupts from the >>> perspective of vfio-platform, but somehow ensure the interrupt stays >>> masked during injection to the guest, rather than just disabled. >> I need this to be clarified actually. I am confused by the automasked >> terminology now. > > My note below tries to explain a bit but AFAICT there's some confusion > in the VFIO code (both platform and PCI) mixing the IRQ terms > mask/unmask with disable/enable, which I think are not quite > interchangeable concepts. IIUC irq enable/disable is a high(er) level > concept that means that even if the irqchip HW sees interrupts coming > in, the kernel refrains from calling the interrupt handler for the IRQ > -- but the kernel keeps track of interrupts that come in while an IRQ > is disabled, in order to know (once the IRQ gets enabled again by the > driver) that there are pending interrupts from the HW. On the other > hand, IRQ masking is a lower level irqchip operation that truly masks > the IRQs at the hardware level, so the kernel doesn't even know or > care if the line is being activated/deactivated while the IRQ is > masked. Looking again at the genirq code, irq/chip.c __irq_disable, actual implementation at HW level depends on the irqchip caps. GIC does not implement the irq_enable() cb so this ends up calling mask_irq which eventually calls gic_mask_irq and toggles the enable state for the given IRQ. Thanks Eric > > I don't have a great explanation for why ONESHOT interrupts are even a > thing in the first place, maybe the HW design means you're going to > get a bunch of garbage/meaningless ups and downs on the interrupt line > while the interrupt is being serviced and they should never be taken > into consideration in any way by the kernel until the IRQ is unmasked > again? > >> >> Thanks >> >> Eric >> >> I'm >>> not sure whether this could cause legitimate interrupts coming from >>> devices to be missed while the injection for an existing interrupt is >>> underway, but maybe this is a rare enough scenario that we wouldn't >>> care. The main issue with this approach is that handle_level_irq()[16] >>> will try to unmask the irq out from under us after we start the >>> injection (as it is already masked before >>> vfio_automasked_irq_handler[17] runs anyway). Not sure if masking at >>> the irqchip level supports nesting or not. >>> >>> Let me know if you think either of these are viable options for adding >>> ONESHOT interrupt forwarding support to vfio-platform? >>> >>> Thanks, >>> Micah >>> >>> >>> >>> >>> Additional note about level triggered vs ONESHOT irq forwarding: >>> For the regular type of level triggered interrupt described above, the >>> vfio handler will call disable_irq_nosync()[18] before the >>> handle_level_irq() function unmasks the irq and returns. This ensures >>> if new interrupts come in on the line while the existing one is being >>> handled by the guest (and the irq is therefore disabled), that the >>> vfio_automasked_irq_handler() isn’t triggered again until the >>> vfio_platform_unmask_handler() function has been triggered by the >>> guest (causing the irq to be re-enabled[19]). In other words, the >>> purpose of the irq enable/disable that already exists in vfio-platform >>> is a higher level concept that delays handling of additional >>> level-triggered interrupts in the host until the current one has been >>> handled in the guest. >>> >>> This means that the existing level triggered interrupt forwarding >>> logic in vfio/vfio-platform is not sufficient for handling ONESHOT >>> interrupts (i.e. we can’t just treat a ONESHOT interrupt like a >>> regular level triggered interrupt in the host and use the existing >>> vfio forwarding code). The masking that needs to happen for ONESHOT >>> interrupts is at the lower level of the irqchip mask/unmask in that >>> the ONESHOT irq needs to remain masked (not just disabled) until the >>> driver’s threaded handler has completed. >>> >>> >>> >>> >>> [1] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L642 >>> [2] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L414 >>> [3] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L619 >>> [4] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L702 >>> [5] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L688 >>> [6] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L724 >>> [7] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L688 >>> [8] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L1028 >>> [9] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c >>> [10] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L2038 >>> [11] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L167 >>> [12] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L142 >>> [13] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L1028 >>> [14] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L94 >>> [15] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L310 >>> [16] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L642 >>> [17] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L142 >>> [18] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L154 >>> [19] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L87 >>> >> >