On Mon, May 31, 2021 at 04:02:02PM +0200, Thomas Gleixner wrote: > On Sun, May 23 2021 at 20:50, Jason Gunthorpe wrote: > > On Fri, May 21, 2021 at 05:20:37PM -0700, Dave Jiang wrote: > >> @@ -77,8 +80,18 @@ int idxd_mdev_host_init(struct idxd_device *idxd, struct mdev_driver *drv) > >> return rc; > >> } > >> > >> + ims_info.max_slots = idxd->ims_size; > >> + ims_info.slots = idxd->reg_base + idxd->ims_offset; > >> + idxd->ims_domain = pci_ims_array_create_msi_irq_domain(idxd->pdev, &ims_info); > >> + if (!idxd->ims_domain) { > >> + dev_warn(dev, "Fail to acquire IMS domain\n"); > >> + iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX); > >> + return -ENODEV; > >> + } > > > > I'm quite surprised that every mdev doesn't create its own ims_domain > > in its probe function. > > What for? IDXD wouldn't need it, but proper IMS HW with no bound of number of vectors can't provide a ims_info.max_slots value here. Instead each use use site, like VFIO, would want to specify the number of vectors to allocate for its own usage, then parcel them out one by one in the normal way. Basically VFIO is emulating a normal MSI-X table. > > This places a global total limit on the # of vectors which makes me > > ask what was the point of using IMS in the first place ? > > That depends on how IMS is implemented. The IDXD variant has a fixed > sized message store which is shared between all subdevices, so yet > another domain would not provide any value. Right, IDXD would have been perfectly happy to use the normal MSI-X table from what I can see. > For the case where the IMS store is seperate, you still have one central > irqdomain per physical device. The domain allocation function can then > create storage on demand or reuse existing storage and just fill in the > pointers. I think it is philosophically backwards, and it is in part what is motivating pretending this weird auxdomain and PASID stuff is generic. The VFIO model is the IRQ table is associated with a VM. When the vfio_device is created it decides how big the MSI-X table will be and it needs to allocate a block of interrupts to emulate it. For security those interrupts need to be linked in the HW to the vfio_device and the VM. ie VM A cannot trigger an interrupt that would deliver to VM B. IDXD choose to use the PASID, but other HW might use a generic VM_ID. Further, IDXD choose to use a VM_ID per IMS entry, but other HW is likely to use a VM_ID per block of IMS entries. Ie the HW tree starts a VM object, then locates the IMS table for that object, then triggers the interrupt. If we think about the later sort of HW I don't think the whole aux data and domain per pci function makes alot of sense. You'd want a domain per VM_ID and all the IMS entires in that domain share the same VM_ID. In this regard the irq domain will correspond to the security boundary. While IDXD is probably fine to organize its domains like this, I am surprised to learn there is basically no reason for it to be using IMS. Jason