On Sat, Nov 05, 2022 at 07:17:38AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Sent: Wednesday, October 26, 2022 2:12 AM > > > > +/** > > + * iommufd_device_bind - Bind a physical device to an iommu fd > > + * @ictx: iommufd file descriptor > > + * @dev: Pointer to a physical PCI device struct > > + * @id: Output ID number to return to userspace for this device > > + * > > + * A successful bind establishes an ownership over the device and returns > > + * struct iommufd_device pointer, otherwise returns error pointer. > > + * > > + * A driver using this API must set driver_managed_dma and must not touch > > + * the device until this routine succeeds and establishes ownership. > > + * > > + * Binding a PCI device places the entire RID under iommufd control. > > + * > > Then what about non-PCI device? clearer to say that calling this interface > just places the entire physical device under iommufd control We don't know what other bus types will do with this API. The above is guidance specifically for PCI, other bus types should add their own guidance as it evolves. > > + * FIXME: This is conceptually broken for iommufd since we want to > > allow > > + * userspace to change the domains, eg switch from an identity IOAS > > to a > > + * DMA IOAs. There is currently no way to create a MSI window that > > IOAs -> IOAS Done > > > + rc = iommu_get_msi_cookie(hwpt->domain, sw_msi_start); > > + if (rc && rc != -ENODEV) > > + return rc; > > I know this is copied from VFIO but a comment is appreciated why > -ENODEV is considered sane to move forward. Huh. I actually don't know. It looks like it is here to detect that the static inline for !CONFIG_IOMMU_DMA returns it. However, if we have !CONFIG_IOMMU_DMA then I think we also don't get interrupts at all. Ie iommu_dma_prepare_msi() becomes a NOP and nothing will map the ITS page into the iomm_domain. So let's drop the ENODEV check, it looks wrong. > > + /* > > + * Otherwise the platform has a MSI window that is not isolated. For > > + * historical compat with VFIO allow a module parameter to ignore > > the > > + * insecurity. > > + */ > > + if (!(flags & > > IOMMUFD_ATTACH_FLAGS_ALLOW_UNSAFE_INTERRUPT)) > > + return -EPERM; > > Throw out a warning similar to vfio. Ah... That is kind of gross.. But OK, we can fix it later if someone comes up with a driver that can safely use IOMMUFD_ATTACH_FLAGS_ALLOW_UNSAFE_INTERRUPT. @@ -167,6 +167,11 @@ static int iommufd_device_setup_msi(struct iommufd_device *idev, */ if (!(flags & IOMMUFD_ATTACH_FLAGS_ALLOW_UNSAFE_INTERRUPT)) return -EPERM; + else + dev_warn( + idev->dev, + "Device interrupts cannot be isolated by the IOMMU, this platform in insecure. Use an \"allow_unsafe_interrupts\" module > + return 0; > > + rc = iopt_table_enforce_group_resv_regions(&hwpt->ioas->iopt, > > idev->dev, > > + idev->group, > > &sw_msi_start); > > + if (rc) > > + goto out_iova; > > goto out_unlock since the function internally already called > __iopt_remove_reserved_iova() upon error. OK > > + /* > > + * There is no differentiation when domains are allocated, so any > > domain > > + * that is willing to attach to the device is interchangeable with any > > + * other. > > + */ > > + mutex_lock(&ioas->mutex); > > + list_for_each_entry(hwpt, &ioas->hwpt_list, hwpt_item) { > > + if (!hwpt->auto_domain || > > + !refcount_inc_not_zero(&hwpt->obj.users)) > > users cannot be zero at this point. > > a new hwpt is added to the list with users==1 in attach. > > detach first removes the hwpt and then decrement users. > > Both are conducted by holding ioas->mutex. > > + continue; > > + > > + rc = iommufd_device_do_attach(idev, hwpt, flags); > > + refcount_dec(&hwpt->obj.users); > > with above I also wonder whether refcount_inc/dec are just > redundant here... Yes, it seems so (in ealier versions this was not true, but it is now): @@ -267,12 +272,11 @@ static int iommufd_device_auto_get_domain(struct iommufd_device *idev, */ mutex_lock(&ioas->mutex); list_for_each_entry(hwpt, &ioas->hwpt_list, hwpt_item) { - if (!hwpt->auto_domain || - !refcount_inc_not_zero(&hwpt->obj.users)) + if (!hwpt->auto_domain) continue; rc = iommufd_device_do_attach(idev, hwpt, flags); - refcount_dec(&hwpt->obj.users); + > > +int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id, > > + unsigned int flags) > > +{ > > + struct iommufd_object *pt_obj; > > + int rc; > > + > > + pt_obj = iommufd_get_object(idev->ictx, *pt_id, > > IOMMUFD_OBJ_ANY); > > + if (IS_ERR(pt_obj)) > > + return PTR_ERR(pt_obj); > > Is there a reason why get_object() is not required for idev? Caller has to hold a legal reference because caller is responsible for the pointer. > concurrent attach/unbind could lead to use-after-free here given users > for idev is added only in the end of the function. Caller bug. It is not allowed to continue to use idev while it is destroying it by calling iommufd_device_unbind() > > + default: > > + rc = -EINVAL; > > + goto out_put_pt_obj; > > + } > > + > > + refcount_inc(&idev->obj.users); > > + *pt_id = idev->hwpt->obj.id; > > What is the value of returning hwpt id of auto domain to user? It allows the userspace to potentially query the auto domain, if we decide on some query later. > IMHO this will add more complexity to the life cycle of auto domain. Not really, the usespace still controls the lifecylce, the hwpt id is returned and is valid until userspace commands the device it just bound to become unbound. > now if allowing user to populate auto domain similar to user-created > hwpt then detach also need get_object() to wait for concurrent > ioctls similar to iommufd_destroy() does and more trick on destroying > the object. "autodomain" means the hwpt is destroyed when its device list becomes empty. This is all protected by the ioas->mutex, so we don't need additional refcounting or locking. > > +void iommufd_device_detach(struct iommufd_device *idev) > > +{ > > + struct iommufd_hw_pagetable *hwpt = idev->hwpt; > > + > > + mutex_lock(&hwpt->ioas->mutex); > > + mutex_lock(&hwpt->devices_lock); > > + list_del(&idev->devices_item); > > + if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) { > > + if (list_empty(&hwpt->devices)) { > > + iopt_table_remove_domain(&hwpt->ioas->iopt, > > + hwpt->domain); > > + list_del(&hwpt->hwpt_item); > > + } > > + iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev); > > this is always required. not just for last device in a group. Yep, good catch Thanks, Jason