On Tue, Jul 21, 2020 at 09:02:15AM -0700, Dave Jiang wrote: > v2: > IMS (now dev-msi): > With recommendations from Jason/Thomas/Dan on making IMS more generic: > Pass a non-pci generic device(struct device) for IMS management instead of mdev > Remove all references to mdev and symbol_get/put > Remove all references to IMS in common code and replace with dev-msi > remove dynamic allocation of platform-msi interrupts: no groups,no new msi list or list helpers > Create a generic dev-msi domain with and without interrupt remapping enabled. > Introduce dev_msi_domain_alloc_irqs and dev_msi_domain_free_irqs apis I didn't dig into the details of irq handling to really check this, but the big picture of this is much more in line with what I would expect for this kind of ability. > Link to previous discussions with Jason: > https://lore.kernel.org/lkml/57296ad1-20fe-caf2-b83f-46d823ca0b5f@xxxxxxxxx/ > The emulation part that can be moved to user space is very small due to the majority of the > emulations being control bits and need to reside in the kernel. We can revisit the necessity of > moving the small emulation part to userspace and required architectural changes at a later time. The point here is that you already have a user space interface for these queues that already has kernel support to twiddle the control bits. Generally I'd expect extending that existing kernel code to do the small bit more needed for mapping the queue through to PCI emulation to be smaller than the 2kloc of new code here to put all the emulation and support framework in the kernel, and exposes a lower attack surface of kernel code to the guest. > The kernel can specify the requirements for these callback functions > (e.g., the driver is not expected to block, or not expected to take > a lock in the callback function). I didn't notice any of this in the patch series? What is the calling context for the platform_msi_ops ? I think I already mentioned that ideally we'd need blocking/sleeping. The big selling point is that IMS allows this data to move off-chip, which means accessing it is no longer just an atomic write to some on-chip memory. These details should be documented in the comment on top of platform_msi_ops I'm actually a little confused how idxd_ims_irq_mask() manages this - I thought IRQ masking should be synchronous, shouldn't there at least be a flushing read to ensure that new MSI's are stopped and any in flight are flushed to the APIC? Jason