Re: [PATCH v5 01/13] iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDEVICE_ALLOC ioctl

Nicolin Chen <nicolinc@xxxxxxxxxx> · Tue, 29 Oct 2024 12:30:00 -0700

On Tue, Oct 29, 2024 at 03:48:01PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 29, 2024 at 10:29:56AM -0700, Nicolin Chen wrote:
> > On Tue, Oct 29, 2024 at 12:58:24PM -0300, Jason Gunthorpe wrote:
> > > On Fri, Oct 25, 2024 at 04:50:30PM -0700, Nicolin Chen wrote:
> > > > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > > > index 5fd3dd420290..e50113305a9c 100644
> > > > --- a/drivers/iommu/iommufd/device.c
> > > > +++ b/drivers/iommu/iommufd/device.c
> > > > @@ -277,6 +277,17 @@ EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD);
> > > >   */
> > > >  void iommufd_device_unbind(struct iommufd_device *idev)
> > > >  {
> > > > +	u32 vdev_id = 0;
> > > > +
> > > > +	/* idev->vdev object should be destroyed prior, yet just in case.. */
> > > > +	mutex_lock(&idev->igroup->lock);
> > > > +	if (idev->vdev)
> > > 
> > > Then should it have a WARN_ON here?
> > 
> > It'd be a user space mistake that forgot to call the destroy ioctl
> > to the object, in which case I recall kernel shouldn't WARN_ON?
> 
> But you can't get here because:
> 
>  	refcount_inc(&idev->obj.users);
> 
> And kernel doesn't destroy objects with elevated ref counts?

Hmm, this is not a ->destroy() but iommufd_device_unbind called
by VFIO. And we actually ran into this routine when QEMU didn't
destroy vdev. So, I added this chunk.

The iommufd_object_remove(vdev_id) here would destroy the vdev
where its destroy() does refcount_dec(&idev->obj.users). Then,
the following iommufd_object_destroy_user(.., &idev->obj) will
succeed.

With that said, let's just mandate userspace to destroy vdev.

> > > > +		vdev_id = idev->vdev->obj.id;
> > > > +	mutex_unlock(&idev->igroup->lock);
> > > > +	/* Relying on xa_lock against a race with iommufd_destroy() */
> > > > +	if (vdev_id)
> > > > +		iommufd_object_remove(idev->ictx, NULL, vdev_id, 0);
> > > 
> > > That doesn't seem right, iommufd_object_remove() should never be used
> > > to destroy an object that userspace created with an IOCTL, in fact
> > > that just isn't allowed.
> > 
> > It was for our auto destroy feature. 
> 
> auto domains are "hidden" hwpts that are kernel managed. They are not
> "userspace created".
> 
> "Usespace created" objects are ones that userspace is expected to call
> destroy on.

OK. I misunderstood that.

> If you destroy them behind the scenes in the kerenl then the objecd ID
> can be reallocated for something else and when userspace does DESTROY
> on the ID it thought was still allocated it will malfunction.
> 
> So, only userspace can destroy objects that userspace created.

I see. That makes sense.

> > If user space forgot to destroy the object while trying to unplug
> > the device from VM. This saves the day.
> 
> No, it should/does fail destroy of the VIOMMU object because the users
> refcount is elevated.

The vIOMMU object is refcount_dec also from the unbind() calling
remove(). But anyway, we aligned that userspace should destroy it
explicitly.

> > > Ugh, there is worse here, we can't hold a long term reference on a
> > > kernel owned object:
> > > 
> > > 	idev->vdev = vdev;
> > > 	refcount_inc(&idev->obj.users);
> > > 
> > > As it prevents the kernel from disconnecting it.
> > 
> > Hmm, mind elaborating? I think the iommufd_fops_release() would
> > xa_for_each the object list that destroys the vdev object first
> > then this idev (and viommu too)?
> 
> iommufd_device_unbind() can't fail, and if the object can't be
> destroyed because it has an elevated long term refcount it WARN's:
> 
> 
> 	ret = iommufd_object_remove(ictx, obj, obj->id, REMOVE_WAIT_SHORTTERM);
> 
> 	/*
> 	 * If there is a bug and we couldn't destroy the object then we did put
> 	 * back the caller's users refcount and will eventually try to free it
> 	 * again during close.
> 	 */
> 	WARN_ON(ret);
> 
> So you cannot take long term references on kernel owned objects. Only
> userspace owned objects.

OK. I think I had got this part. Gao ran into this WARN_ON at v3,
so I added iommufd_object_remove(vdev_id) in unbind() prior to
this iommufd_object_destroy_user(idev->ictx, &idev->obj).

> > OK. If user space forgot to destroy its vdev while unplugging the
> > device, it would not be allowed to hotplug another device (or the
> > same device) back to the same slot having the same RID, since the
> > RID on the vIOMMU would be occupied by the undestroyed vdev.
> 
> Yes, that seems correct and obvious to me. Until the vdev is
> explicitly destroyed the ID is in-use.
> 
> Good userspace should destroy the iommufd vDEVICE object before
> closing the VFIO file descriptor.
> 
> If it doesn't, then the VDEVICE object remains even though the VFIO it
> was linked to is gone.

I see.

Thanks
Nicolin