Re: Add vfio-platform support for ONESHOT irq forwarding?

Auger Eric <eric.auger@xxxxxxxxxx> · Tue, 26 Jan 2021 17:05:03 +0100

Hi Micah,

On 1/26/21 4:31 PM, Micah Morton wrote:
> On Tue, Jan 26, 2021 at 3:48 AM Auger Eric <eric.auger@xxxxxxxxxx> wrote:
>>
>> Hi Micah,
>>
>> On 1/25/21 7:32 PM, Micah Morton wrote:
>>> On Mon, Jan 25, 2021 at 12:31 PM Auger Eric <eric.auger@xxxxxxxxxx> wrote:
>>>>
>>>> Hi Micah,
>>>>
>>>> On 1/25/21 4:46 PM, Micah Morton wrote:
>>>>> Hi Eric,
>>>>>
>>>>> I was recently looking into some vfio-platform passthrough stuff and
>>>>> came across a device I wanted to assign to a guest that uses a ONESHOT
>>>>> type interrupt (these type of interrupts seem to be quite common, on
>>>>> ARM at least). The semantics for ONESHOT interrupts are a bit
>>>>> different from regular level triggered interrupts as I'll describe
>>>>> here:
>>>>>
>>>>> The normal generic code flow for level-triggered interrupts is as follows:
>>>>>
>>>>> - regular type[1]: mask[2] the irq, then run the handler, then
>>>>> unmask[3] the irq and done
>>>>
>>>> VFIO level sensitive interrupts are "automasked". See slide 10 of
>>>> https://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf if this can help.
>>>
>>> When you say "automasked" / "mask and deactivate" are you referring to
>>> the disable_irq_nosync() call in vfio_platform
>>> (https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L154)?
>> Yes that's what I meant
>>>
>>>>
>>>> When the guest deactivates the virtual IRQ, this causes a maintenance
>>>> interrupt on host. This occurence causes kvm_notify_acked_irq() to be
>>>> called and this latter unmasks the physical IRQ.
>>>
>>> Are you saying kvm_notify_acked_irq() causes
>>> vfio_platform_unmask_handler()
>>> (https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L94)
>>> to be called? Makes sense if so.
>> Yes this ends up calling irqfd_resampler_ack which toggles the virtual
>> irq line down and calls the unmask eventfd
>>>
>>>>>
>>>>> - fasteoi type[4]: run the handler, then eoi[5] the irq and done
>>>>>
>>>>> Note: IIUC the fasteoi type doesn't do any irq masking/unmasking
>>>>> because that is assumed to be handled transparently by "modern forms
>>>>> of interrupt handlers, which handle the flow details in hardware"
>>>>>
>>>>> ONESHOT type interrupts are a special case of the fasteoi type
>>>>> described above. They rely on the driver registering a threaded
>>>>> handler for the interrupt and assume the irq line will remain masked
>>>>> until the threaded handler completes, at which time the line will be
>>>>> unmasked. TL;DR:
>>>>>
>>>>> - mask[6] the irq, run the handler, and potentially eoi[7] the irq,
>>>>> then unmask[8] later when the threaded handler has finished running.
>>>>
>>>> could you point me to the exact linux handler (is it
>>>> handle_fasteoi_irq?)  or Why do you say "potentially". Given the details
>>>> above, the guest EOI would unmask the IRQ at physical level. We do not
>>>> have any hook KVM/VFIO on the guest unmap.
>>>
>>> Yep. handle_fasteoi_irq(). By "potentially" I just meant the eoi is
>>> only called for ONESHOT irqs if this check passes: !(chip->flags &
>>> IRQCHIP_EOI_THREADED). Not actually sure what this check is about
>>> since I didn't look into it.
>> I checked IRQCHIP_EOI_THREADED and it does not seem to be supported by
>> GIC irqchip. so this would mean that the EOI is not called in our case?
> 
> Sounds like it, thanks for looking into that.
> 
>>
>>>
>>> So far I was just talking about irq operations that happen in the
>>> host. Not sure I quite understand your last two sentences about guest
>>> EOI and guest unmap.
>> sorry I meant unmask instead of unmap :-( The closure of the physical
>> IRQ is triggered by the deactivation of the guest virtual IRQ.
>> Let's first assume you keep the existing "auto-masked" level sensitive
>> IRQ on host side (vfio platform driver), on the guest driver side you
>> are going to have a oneshot IRQ. What needs to be understood is when
>> does the deactivation happen on guest side.
> 
> That's a good question, I need to go back and look at that.
> 
>>>
>>>>>
>>>>> For vfio-platform irq forwarding, there is no existing function in
>>>>> drivers/vfio/platform/vfio_platform_irq.c[9] that is a good candidate
>>>>> for registering as the threaded handler for a ONESHOT interrupt in the
>>>>> case we want to request the ONESHOT irq with
>>>>> request_threaded_irq()[10]. Moreover, we can't just register a
>>>>> threaded function that simply immediately returns IRQ_HANDLED (as is
>>>>> done in vfio_irq_handler()[11] and vfio_automasked_irq_handler()[12]),
>>>>> since that would cause the IRQ to be unmasked[13] immediately, before
>>>> sorry I know you reworked that several times for style issue but [13]
>>>> does not match unmask().
>>>
>>> That's actually the link I intended, irq_finalize_oneshot() will be
>>> called after the threaded interrupt handler returns, and that's when
>>> the IRQ will be unmasked.
>> Hum this points to something within irq_thread_check_affinity(). Anyway.
> 
> Sorry I was pointing the URL at a specific line in
> irq_finalize_oneshot() but it was probably confusing.
> https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L984
> is the function I'm talking about.
> 
>>>
>>>>> the userspace/guest driver has had any chance to service the
>>>>> interrupt.
>>>>>
>>>>> The most obvious way I see to handle this is to add a threaded handler
>>>>> to vfio_platform_irq.c that waits until the userspace/guest driver has
>>>>> serviced the interrupt and the unmask_handler[14] has been called, at
>>>>> which point it returns IRQ_HANDLED so the generic IRQ code in the host
>>>>> can finally unmask the interrupt.
>>>> this is done on guest EOI at the moment.
>>>
>>> By "this", I think you mean enable_irq() in vfio_platform_irq.c?
>> right
>>>
>>>>>
>>>>> Does this sound like a reasonable approach and something you would be
>>>>> fine with adding to vfio-platform?
>>>> Well I think it is interesting to do a pre-study and make sure we agree
>>>> on the terminology, IRQ flow and problems that would need to be solved.
>>>> Then we need to determine if it is worth the candle, if this support
>>>> would speed up the vfio-platform usage (I am not sure at this point as I
>>>> think there are more important drags like the lack of specification/dt
>>>> integration for instance).
>>>>
>>>> Besides we need to make sure we are not launching into a huge effort for
>>>> attempting to assign a device that does not fit well into the vfio
>>>> platform scope (dma capable, reset, device tree).
>>>
>>> Yes, agreed.
>>>
>>>>
>>>>  If so I could get started looking
>>>>> at the implementation for how to sleep in the threaded handler in
>>>>> vfio-platform until the unmask_handler is called. The most tricky/ugly
>>>>> part of this is that DT has no knowledge of irq ONESHOT-ness, as it
>>>>> only contains info regarding active-low vs active-high and edge vs
>>>>> level trigger. That means that vfio-platform can't figure out that a
>>>>> device uses a ONESHOT irq in a similar way to how it queries[15] the
>>>>> trigger type, and by extension QEMU can't learn this information
>>>>> through the VFIO_DEVICE_GET_IRQ_INFO ioctl, but must have another way
>>>>> of knowing (i.e. command line option to QEMU).
>>>> Indeed that's not really appealing.
>>>>>
>>>>> I guess potentially another option would be to treat ONESHOT
>>>>> interrupts like regular level triggered interrupts from the
>>>>> perspective of vfio-platform, but somehow ensure the interrupt stays
>>>>> masked during injection to the guest, rather than just disabled.
>>>> I need this to be clarified actually. I am confused by the automasked
>>>> terminology now.
>>>
>>> My note below tries to explain a bit but AFAICT there's some confusion
>>> in the VFIO code (both platform and PCI) mixing the IRQ terms
>>> mask/unmask with disable/enable, which I think are not quite
>>> interchangeable concepts. IIUC irq enable/disable is a high(er) level
>>> concept that means that even if the irqchip HW sees interrupts coming
>>> in, the kernel refrains from calling the interrupt handler for the IRQ
>>> -- but the kernel keeps track of interrupts that come in while an IRQ
>>> is disabled, in order to know (once the IRQ gets enabled again by the
>>> driver) that there are pending interrupts from the HW. On the other
>>> hand, IRQ masking is a lower level irqchip operation that truly masks
>>> the IRQs at the hardware level, so the kernel doesn't even know or
>>> care if the line is being activated/deactivated while the IRQ is
>>> masked.
>>
>> Looking again at the genirq code, irq/chip.c __irq_disable, actual
>> implementation at HW level depends on the irqchip caps. GIC does not
>> implement the irq_enable() cb so this ends up calling mask_irq which
>> eventually calls gic_mask_irq and toggles the enable state for the given
>> IRQ.
> 
> Are you talking about the lazy disable approach
> (https://elixir.bootlin.com/linux/v5.10.10/source/kernel/irq/chip.c#L371)?
> Sounds like you're implying IRQ_DISABLE_UNLAZY will be true for the
> irq when we're doing VFIO forwarding? Or else we would see an extra
> pending interrupt on the line during IRQ forwarding.

I tried to remember the path from __irq_disable SW flow downto the GIC
SPI programming. I was not implying the IRQ_DISABLE_UNLAZY flag.

As fas as I remember lazy disable approach was in use when I did the
integration. But that's just an optimization that avoids to access the
HW unnecessarily (ie. the handler may be invoked even if the irq is
disabled).

Thanks

Eric
> 
> Maybe setting the IRQ_DISABLE_UNLAZY flag is the solution for ONESHOT
> irq forwarding actually, so we guarantee the IRQ is masked immediately
> and avoid the lazy approach that could lead to an erroneous pending
> interrupt?
> 
>>
>> Thanks
>>
>> Eric
>>>
>>> I don't have a great explanation for why ONESHOT interrupts are even a
>>> thing in the first place, maybe the HW design means you're going to
>>> get a bunch of garbage/meaningless ups and downs on the interrupt line
>>> while the interrupt is being serviced and they should never be taken
>>> into consideration in any way by the kernel until the IRQ is unmasked
>>> again?
>>>
>>>>
>>>> Thanks
>>>>
>>>> Eric
>>>>
>>>>  I'm
>>>>> not sure whether this could cause legitimate interrupts coming from
>>>>> devices to be missed while the injection for an existing interrupt is
>>>>> underway, but maybe this is a rare enough scenario that we wouldn't
>>>>> care. The main issue with this approach is that handle_level_irq()[16]
>>>>> will try to unmask the irq out from under us after we start the
>>>>> injection (as it is already masked before
>>>>> vfio_automasked_irq_handler[17] runs anyway). Not sure if masking at
>>>>> the irqchip level supports nesting or not.
>>>>>
>>>>> Let me know if you think either of these are viable options for adding
>>>>> ONESHOT interrupt forwarding support to vfio-platform?
>>>>>
>>>>> Thanks,
>>>>> Micah
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Additional note about level triggered vs ONESHOT irq forwarding:
>>>>> For the regular type of level triggered interrupt described above, the
>>>>> vfio handler will call disable_irq_nosync()[18] before the
>>>>> handle_level_irq() function unmasks the irq and returns. This ensures
>>>>> if new interrupts come in on the line while the existing one is being
>>>>> handled by the guest (and the irq is therefore disabled), that the
>>>>> vfio_automasked_irq_handler() isn’t triggered again until the
>>>>> vfio_platform_unmask_handler() function has been triggered by the
>>>>> guest (causing the irq to be re-enabled[19]). In other words, the
>>>>> purpose of the irq enable/disable that already exists in vfio-platform
>>>>> is a higher level concept that delays handling of additional
>>>>> level-triggered interrupts in the host until the current one has been
>>>>> handled in the guest.
>>>>>
>>>>> This means that the existing level triggered interrupt forwarding
>>>>> logic in vfio/vfio-platform is not sufficient for handling ONESHOT
>>>>> interrupts (i.e. we can’t just treat a ONESHOT interrupt like a
>>>>> regular level triggered interrupt in the host and use the existing
>>>>> vfio forwarding code). The masking that needs to happen for ONESHOT
>>>>> interrupts is at the lower level of the irqchip mask/unmask in that
>>>>> the ONESHOT irq needs to remain masked (not just disabled) until the
>>>>> driver’s threaded handler has completed.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> [1] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L642
>>>>> [2] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L414
>>>>> [3] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L619
>>>>> [4] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L702
>>>>> [5] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L688
>>>>> [6] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L724
>>>>> [7] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L688
>>>>> [8] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L1028
>>>>> [9] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c
>>>>> [10] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L2038
>>>>> [11] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L167
>>>>> [12] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L142
>>>>> [13] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L1028
>>>>> [14] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L94
>>>>> [15] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L310
>>>>> [16] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L642
>>>>> [17] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L142
>>>>> [18] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L154
>>>>> [19] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L87
>>>>>
>>>>
>>>
>>
>