> From: Alex Williamson <alex.williamson@xxxxxxxxxx> > Sent: Friday, November 3, 2023 11:51 PM > > On Fri, 3 Nov 2023 07:23:13 +0000 > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > > > From: Alex Williamson <alex.williamson@xxxxxxxxxx> > > > Sent: Friday, November 3, 2023 5:14 AM > > > > > > On Thu, 2 Nov 2023 03:14:09 +0000 > > > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > > > > > > > From: Tian, Kevin > > > > > Sent: Thursday, November 2, 2023 10:52 AM > > > > > > > > > > > > > > > > > Without an in-tree user of this code, we're just chopping up code for > > > > > > no real purpose. There's no reason that a variant driver requiring > IMS > > > > > > couldn't initially implement their own SET_IRQS ioctl. Doing that > > > > > > > > > > this is an interesting idea. We haven't seen a real usage which wants > > > > > such MSI emulation on IMS for variant drivers. but if the code is > > > > > simple enough to demonstrate the 1st user of IMS it might not be > > > > > a bad choice. There are additional trap-emulation required in the > > > > > device MMIO bar (mostly copying MSI permission entry which > contains > > > > > PASID info to the corresponding IMS entry). At a glance that area > > > > > is 4k-aligned so should be doable. > > > > > > > > > > > > > misread the spec. the MSI-X permission table which provides > > > > auxiliary data to MSI-X table is not 4k-aligned. It sits in the 1st > > > > 4k page together with many other registers. emulation of them > > > > could be simple with a native read/write handler but not sure > > > > whether any of them may sit in a hot path to affect perf due to > > > > trap... > > > > > > I'm not sure if you're referring to a specific device spec or the PCI > > > spec, but the PCI spec has long included an implementation note > > > suggesting alignment of the MSI-X vector table and pba and separation > > > from CSRs, and I see this is now even more strongly worded in the 6.0 > > > spec. > > > > > > Note though that for QEMU, these are emulated in the VMM and not > > > written through to the device. The result of writes to the vector > > > table in the VMM are translated to vector use/unuse operations, which > > > we see at the kernel level through SET_IRQS ioctl calls. Are you > > > expecting to get PASID information written by the guest through the > > > emulated vector table? That would entail something more than a simple > > > IMS backend to MSI-X frontend. Thanks, > > > > > > > I was referring to IDXD device spec. Basically it allows a process to > > submit a descriptor which contains a completion interrupt handle. > > The handle is the index of a MSI-X entry or IMS entry allocated by > > the idxd driver. To mark the association between application and > > related handles the driver records the PASID of the application > > in an auxiliary structure for MSI-X (called MSI-X permission table) > > or directly in the IMS entry. This additional info includes whether > > an MSI-X/IMS entry has PASID enabled and if yes what is the PASID > > value to be checked against the descriptor. > > > > As you said virtualizing MSI-X table itself is via SET_IRQS and it's > > 4k aligned. Then we also need to capture guest updates to the MSI-X > > permission table and copy the PASID information into the > > corresponding IMS entry when using the IMS backend. It's MSI-X > > permission table not 4k aligned then trapping it will affect adjacent > > registers. > > > > My quick check in idxd spec doesn't reveal an real impact in perf > > critical path. Most registers are configuration/control registers > > accessed at driver init time and a few interrupt registers related > > to errors or administrative purpose. > > Right, it looks like you'll need to trap writes to the MSI-X > Permissions Table via a sparse mmap capability to avoid assumptions > whether it lives on the same page as the MSI-X vector table or PBA. > Ideally the hardware folks have considered this to avoid any conflict > with latency sensitive registers. > > The variant driver would use this for collecting the meta data relative > to the IMS interrupt, but this is all tangential to whether we > preemptively slice up vfio-pci-core's SET_IRQS ioctl or the iDXD driver > implements its own. Agree > > And just to be clear, I don't expect the iDXD variant driver to go to > extraordinary lengths to duplicate the core ioctl, we can certainly > refactor and export things where it makes sense, but I think it likely > makes more sense for the variant driver to implement the shell of the > ioctl rather than trying to multiplex the entire core ioctl with an ops > structure that's so intimately tied to the core implementation and > focused only on the MSI-X code paths. Thanks, > btw I'll let Reinette to decide whether she wants to implement such a variant driver or waits until idxd siov driver is ready to demonstrate the usage. One concern with the variant driver approach is lacking of a real-world usage (e.g. what IMS brings when guest only wants MSI-X on an assigned PF). Though it may provide a shorter path to enable the IMS backend support, also comes with the long-term maintenance burden.