On Tue, Oct 29, 2024 at 03:48:01PM -0300, Jason Gunthorpe wrote: > On Tue, Oct 29, 2024 at 10:29:56AM -0700, Nicolin Chen wrote: > > On Tue, Oct 29, 2024 at 12:58:24PM -0300, Jason Gunthorpe wrote: > > > On Fri, Oct 25, 2024 at 04:50:30PM -0700, Nicolin Chen wrote: > > > > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c > > > > index 5fd3dd420290..e50113305a9c 100644 > > > > --- a/drivers/iommu/iommufd/device.c > > > > +++ b/drivers/iommu/iommufd/device.c > > > > @@ -277,6 +277,17 @@ EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD); > > > > */ > > > > void iommufd_device_unbind(struct iommufd_device *idev) > > > > { > > > > + u32 vdev_id = 0; > > > > + > > > > + /* idev->vdev object should be destroyed prior, yet just in case.. */ > > > > + mutex_lock(&idev->igroup->lock); > > > > + if (idev->vdev) > > > > > > Then should it have a WARN_ON here? > > > > It'd be a user space mistake that forgot to call the destroy ioctl > > to the object, in which case I recall kernel shouldn't WARN_ON? > > But you can't get here because: > > refcount_inc(&idev->obj.users); > > And kernel doesn't destroy objects with elevated ref counts? Hmm, this is not a ->destroy() but iommufd_device_unbind called by VFIO. And we actually ran into this routine when QEMU didn't destroy vdev. So, I added this chunk. The iommufd_object_remove(vdev_id) here would destroy the vdev where its destroy() does refcount_dec(&idev->obj.users). Then, the following iommufd_object_destroy_user(.., &idev->obj) will succeed. With that said, let's just mandate userspace to destroy vdev. > > > > + vdev_id = idev->vdev->obj.id; > > > > + mutex_unlock(&idev->igroup->lock); > > > > + /* Relying on xa_lock against a race with iommufd_destroy() */ > > > > + if (vdev_id) > > > > + iommufd_object_remove(idev->ictx, NULL, vdev_id, 0); > > > > > > That doesn't seem right, iommufd_object_remove() should never be used > > > to destroy an object that userspace created with an IOCTL, in fact > > > that just isn't allowed. > > > > It was for our auto destroy feature. > > auto domains are "hidden" hwpts that are kernel managed. They are not > "userspace created". > > "Usespace created" objects are ones that userspace is expected to call > destroy on. OK. I misunderstood that. > If you destroy them behind the scenes in the kerenl then the objecd ID > can be reallocated for something else and when userspace does DESTROY > on the ID it thought was still allocated it will malfunction. > > So, only userspace can destroy objects that userspace created. I see. That makes sense. > > If user space forgot to destroy the object while trying to unplug > > the device from VM. This saves the day. > > No, it should/does fail destroy of the VIOMMU object because the users > refcount is elevated. The vIOMMU object is refcount_dec also from the unbind() calling remove(). But anyway, we aligned that userspace should destroy it explicitly. > > > Ugh, there is worse here, we can't hold a long term reference on a > > > kernel owned object: > > > > > > idev->vdev = vdev; > > > refcount_inc(&idev->obj.users); > > > > > > As it prevents the kernel from disconnecting it. > > > > Hmm, mind elaborating? I think the iommufd_fops_release() would > > xa_for_each the object list that destroys the vdev object first > > then this idev (and viommu too)? > > iommufd_device_unbind() can't fail, and if the object can't be > destroyed because it has an elevated long term refcount it WARN's: > > > ret = iommufd_object_remove(ictx, obj, obj->id, REMOVE_WAIT_SHORTTERM); > > /* > * If there is a bug and we couldn't destroy the object then we did put > * back the caller's users refcount and will eventually try to free it > * again during close. > */ > WARN_ON(ret); > > So you cannot take long term references on kernel owned objects. Only > userspace owned objects. OK. I think I had got this part. Gao ran into this WARN_ON at v3, so I added iommufd_object_remove(vdev_id) in unbind() prior to this iommufd_object_destroy_user(idev->ictx, &idev->obj). > > OK. If user space forgot to destroy its vdev while unplugging the > > device, it would not be allowed to hotplug another device (or the > > same device) back to the same slot having the same RID, since the > > RID on the vIOMMU would be occupied by the undestroyed vdev. > > Yes, that seems correct and obvious to me. Until the vdev is > explicitly destroyed the ID is in-use. > > Good userspace should destroy the iommufd vDEVICE object before > closing the VFIO file descriptor. > > If it doesn't, then the VDEVICE object remains even though the VFIO it > was linked to is gone. I see. Thanks Nicolin