> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Friday, December 10, 2021 4:59 AM > > On Thu, Dec 09, 2021 at 09:32:42PM +0100, Thomas Gleixner wrote: > > On Thu, Dec 09 2021 at 12:21, Jason Gunthorpe wrote: > > > On Thu, Dec 09, 2021 at 09:37:06AM +0100, Thomas Gleixner wrote: > > > If we keep the MSI emulation in the hypervisor then MSI != IMS. The > > > MSI code needs to write a addr/data pair compatible with the emulation > > > and the IMS code needs to write an addr/data pair from the > > > hypercall. Seems like this scenario is best avoided! > > > > > > From this perspective I haven't connected how virtual interrupt > > > remapping helps in the guest? Is this a way to provide the hypercall > > > I'm imagining above? > > > > That was my thought to avoid having different mechanisms. > > > > The address/data pair is computed in two places: > > > > 1) Activation of an interrupt > > 2) Affinity setting on an interrupt > > > > Both configure the IRTE when interrupt remapping is in place. > > > > In both cases a vector is allocated in the vector domain and based on > > the resulting target APIC / vector number pair the IRTE is > > (re)configured. > > > > So putting the hypercall into the vIRTE update is the obvious > > place. Both activation and affinity setting can fail and propagate an > > error code down to the originating caller. > > > > Hmm? > > Okay, I think I get it. Would be nice to have someone from intel > familiar with the vIOMMU protocols and qemu code remark what the > hypervisor side can look like. > > There is a bit more work here, we'd have to change VFIO to somehow > entirely disconnect the kernel IRQ logic from the MSI table and > directly pass control of it to the guest after the hypervisor IOMMU IR > secures it. ie directly mmap the msi-x table into the guest > It's supported already: /* * The MSIX mappable capability informs that MSIX data of a BAR can be mmapped * which allows direct access to non-MSIX registers which happened to be within * the same system page. * * Even though the userspace gets direct access to the MSIX data, the existing * VFIO_DEVICE_SET_IRQS interface must still be used for MSIX configuration. */ #define VFIO_REGION_INFO_CAP_MSIX_MAPPABLE 3 IIRC this was introduced for PPC when a device has MSI-X in the same BAR as other MMIO registers. Trapping MSI-X leads to performance downgrade on accesses to adjacent registers. MSI-X can be mapped by userspace because PPC already uses a hypercall mechanism for interrupt. Though unclear about the detail it sounds a similar usage as proposed here. Thanks Kevin