Re: [PATCH RFC v2 00/18] Add VFIO mediated device support and DEV-MSI support for the idxd driver

Jason Gunthorpe <jgg@xxxxxxxxxxxx> · Tue, 21 Jul 2020 13:45:27 -0300

On Tue, Jul 21, 2020 at 09:02:15AM -0700, Dave Jiang wrote:
> v2:
> IMS (now dev-msi):
> With recommendations from Jason/Thomas/Dan on making IMS more generic:
> Pass a non-pci generic device(struct device) for IMS management instead of mdev
> Remove all references to mdev and symbol_get/put
> Remove all references to IMS in common code and replace with dev-msi
> remove dynamic allocation of platform-msi interrupts: no groups,no new msi list or list helpers
> Create a generic dev-msi domain with and without interrupt remapping enabled.
> Introduce dev_msi_domain_alloc_irqs and dev_msi_domain_free_irqs apis

I didn't dig into the details of irq handling to really check this,
but the big picture of this is much more in line with what I would
expect for this kind of ability.

> Link to previous discussions with Jason:
> https://lore.kernel.org/lkml/57296ad1-20fe-caf2-b83f-46d823ca0b5f@xxxxxxxxx/
> The emulation part that can be moved to user space is very small due to the majority of the
> emulations being control bits and need to reside in the kernel. We can revisit the necessity of
> moving the small emulation part to userspace and required architectural changes at a later time.

The point here is that you already have a user space interface for
these queues that already has kernel support to twiddle the control
bits. Generally I'd expect extending that existing kernel code to do
the small bit more needed for mapping the queue through to PCI
emulation to be smaller than the 2kloc of new code here to put all the
emulation and support framework in the kernel, and exposes a lower
attack surface of kernel code to the guest.

> The kernel can specify the requirements for these callback functions
> (e.g., the driver is not expected to block, or not expected to take
> a lock in the callback function).

I didn't notice any of this in the patch series? What is the calling
context for the platform_msi_ops ? I think I already mentioned that
ideally we'd need blocking/sleeping. The big selling point is that IMS
allows this data to move off-chip, which means accessing it is no
longer just an atomic write to some on-chip memory.

These details should be documented in the comment on top of
platform_msi_ops

I'm actually a little confused how idxd_ims_irq_mask() manages this -
I thought IRQ masking should be synchronous, shouldn't there at least be a
flushing read to ensure that new MSI's are stopped and any in flight
are flushed to the APIC?

Jason