Hi Eric,
On 7/7/22 10:25 AM, Eric Auger wrote:
Again, this doesn't seem to be true. Just as explained in my above
reply to Alex, the guest deactivates (EOI) the vIRQ already after the
completion of the vIRQ hardirq handler, not the vIRQ thread.
So VFIO unmask handler gets called too early, before the interrupt
gets serviced and acked in the vIRQ thread.
Fair enough, on vIRQ hardirq handler the physical IRQ gets unmasked.
This event occurs on guest EOI, which triggers the resamplefd. But what
is the state of the vIRQ? Isn't it stil masked until the vIRQ thread
completes, preventing the physical IRQ from being propagated to the guest?
Even if vIRQ is still masked by the time when
vfio_automasked_irq_handler() signals the eventfd (which in itself is
not guaranteed, I guess), I believe KVM is buffering this event, so
after the vIRQ is unmasked, this new IRQ will be injected to the guest
anyway.
It seems the obvious fix is to postpone sending irq ack notifications
in KVM from EOI to unmask (for oneshot interrupts only). Luckily, we
don't need to provide KVM with the info that the given interrupt is
oneshot. KVM can just find it out from the fact that the interrupt is
masked at the time of EOI.
you mean the vIRQ right?
Right.
Before going further and we invest more time in that thread, please
could you give us additional context info and confidence
in/understanding of the stakes. This thread is from Jan 2021 and was
discontinued for a while. vfio-platform currently only is enabled on ARM
and maintained for very few devices which properly implement reset
callbacks and duly use an underlying IOMMU.
Sure. We are not really using vfio-platform for the devices we are
concerned with, since those are not DMA capable devices, and some of
them are not really platform devices but I2C or SPI devices. Instead we
are using (hopefully temporarily) Micah's module for forwarding
arbitrary IRQs [1][2] which mostly reimplements the VFIO irq forwarding
mechanism.
Also with a few simple hacks I managed to use vfio-platform for the same
thing (just as a PoC) and confirmed, unsurprisingly, that the problems
with oneshot interrupts are observed with vfio-platform as well.
[1]
https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-5.10-manatee/virt/lib/platirqforward.c
[2]
https://lkml.kernel.org/kvm/CAJ-EccPU8KpU96PM2PtroLjdNVDbvnxwKwWJr2B+RBKuXEr7Vw@xxxxxxxxxxxxxx/T/
Thanks,
Dmytro