On Sun, Sep 19, 2021 at 02:38:42PM +0800, Liu Yi L wrote: > An I/O address space takes effect in the iommu only after it's attached > by a device. This patch provides iommufd_device_[de/at]tach_ioasid() > helpers for this purpose. One device can be only attached to one ioasid > at this point, but one ioasid can be attached by multiple devices. > > The caller specifies the iommufd_device (returned at binding time) and > the target ioasid when calling the helper function. Upon request, iommufd > installs the specified I/O page table to the correct place in the IOMMU, > according to the routing information (struct device* which represents > RID) recorded in iommufd_device. Future variants could allow the caller > to specify additional routing information (e.g. pasid/ssid) when multiple > I/O address spaces are supported per device. > > Open: > Per Jason's comment in below link, bus-specific wrappers are recommended. > This RFC implements one wrapper for pci device. But it looks that struct > pci_device is not used at all since iommufd_ device already carries all > necessary info. So want to have another discussion on its necessity, e.g. > whether making more sense to have bus-specific wrappers for binding, while > leaving a common attaching helper per iommufd_device. > https://lore.kernel.org/linux-iommu/20210528233649.GB3816344@xxxxxxxxxx/ > > TODO: > When multiple devices are attached to a same ioasid, the permitted iova > ranges and supported pgsize bitmap on this ioasid should be a common > subset of all attached devices. iommufd needs to track such info per > ioasid and update it every time when a new device is attached to the > ioasid. This has not been done in this version yet, due to the temporary > hack adopted in patch 16-18. The hack reuses vfio type1 driver which > already includes the necessary logic for iova ranges and pgsize bitmap. > Once we get a clear direction for those patches, that logic will be moved > to this patch. > > Signed-off-by: Liu Yi L <yi.l.liu@xxxxxxxxx> > drivers/iommu/iommufd/iommufd.c | 226 ++++++++++++++++++++++++++++++++ > include/linux/iommufd.h | 29 ++++ > 2 files changed, 255 insertions(+) > > diff --git a/drivers/iommu/iommufd/iommufd.c b/drivers/iommu/iommufd/iommufd.c > index e45d76359e34..25373a0e037a 100644 > +++ b/drivers/iommu/iommufd/iommufd.c > @@ -51,6 +51,19 @@ struct iommufd_ioas { > bool enforce_snoop; > struct iommufd_ctx *ictx; > refcount_t refs; > + struct mutex lock; > + struct list_head device_list; > + struct iommu_domain *domain; This should just be another xarray indexed by the device id > +/* Caller should hold ioas->lock */ > +static struct ioas_device_info *ioas_find_device(struct iommufd_ioas *ioas, > + struct iommufd_device *idev) > +{ > + struct ioas_device_info *ioas_dev; > + > + list_for_each_entry(ioas_dev, &ioas->device_list, next) { > + if (ioas_dev->idev == idev) > + return ioas_dev; > + } Which eliminates this search. xarray with tightly packed indexes is generally more efficient than linked lists.. > +static int ioas_check_device_compatibility(struct iommufd_ioas *ioas, > + struct device *dev) > +{ > + bool snoop = false; > + u32 addr_width; > + int ret; > + > + /* > + * currently we only support I/O page table with iommu enforce-snoop > + * format. Attaching a device which doesn't support this format in its > + * upstreaming iommu is rejected. > + */ > + ret = iommu_device_get_info(dev, IOMMU_DEV_INFO_FORCE_SNOOP, &snoop); > + if (ret || !snoop) > + return -EINVAL; > + > + ret = iommu_device_get_info(dev, IOMMU_DEV_INFO_ADDR_WIDTH, &addr_width); > + if (ret || addr_width < ioas->addr_width) > + return -EINVAL; > + > + /* TODO: also need to check permitted iova ranges and pgsize bitmap */ > + > + return 0; > +} This seems kind of weird.. I expect the iommufd to hold a SW copy of the IO page table and each time a new domain is to be created it should push the SW copy into the domain. If the domain cannot support it then the domain driver should naturally fail a request. When the user changes the IO page table the SW copy is updated then all of the domains are updated too - again if any domain cannot support the change then it fails and the change is rolled back. It seems like this is a side effect of roughly hacking in the vfio code? > + > +/** > + * iommufd_device_attach_ioasid - attach device to an ioasid > + * @idev: [in] Pointer to struct iommufd_device. > + * @ioasid: [in] ioasid points to an I/O address space. > + * > + * Returns 0 for successful attach, otherwise returns error. > + * > + */ > +int iommufd_device_attach_ioasid(struct iommufd_device *idev, int ioasid) Types for the ioas_id again.. > +{ > + struct iommufd_ioas *ioas; > + struct ioas_device_info *ioas_dev; > + struct iommu_domain *domain; > + int ret; > + > + ioas = ioasid_get_ioas(idev->ictx, ioasid); > + if (!ioas) { > + pr_err_ratelimited("Trying to attach illegal or unkonwn IOASID %u\n", ioasid); > + return -EINVAL; No prints triggered by bad userspace > + } > + > + mutex_lock(&ioas->lock); > + > + /* Check for duplicates */ > + if (ioas_find_device(ioas, idev)) { > + ret = -EINVAL; > + goto out_unlock; > + } just xa_cmpxchg NULL, XA_ZERO_ENTRY > + /* > + * Each ioas is backed by an iommu domain, which is allocated > + * when the ioas is attached for the first time and then shared > + * by following devices. > + */ > + if (list_empty(&ioas->device_list)) { Seems strange, what if the devices are forced to have different domains? We don't want to model that in the SW layer.. > + /* Install the I/O page table to the iommu for this device */ > + ret = iommu_attach_device(domain, idev->dev); > + if (ret) > + goto out_domain; This is where things start to get confusing when you talk about PASID as the above call needs to be some PASID centric API. > @@ -27,6 +28,16 @@ struct iommufd_device * > iommufd_bind_device(int fd, struct device *dev, u64 dev_cookie); > void iommufd_unbind_device(struct iommufd_device *idev); > > +int iommufd_device_attach_ioasid(struct iommufd_device *idev, int ioasid); > +void iommufd_device_detach_ioasid(struct iommufd_device *idev, int ioasid); > + > +static inline int > +__pci_iommufd_device_attach_ioasid(struct pci_dev *pdev, > + struct iommufd_device *idev, int ioasid) > +{ > + return iommufd_device_attach_ioasid(idev, ioasid); > +} If think sis taking in the iommfd_device then there isn't a logical place to signal the PCIness But, I think the API should at least have a kdoc that this is capturing the entire device and specify that for PCI this means all TLPs with the RID. Jason