On Fri, Nov 18, 2022 at 05:27:35PM +0100, Eric Auger wrote: > > +config IOMMUFD > > + tristate "IOMMU Userspace API" > > + select INTERVAL_TREE > > + select INTERVAL_TREE_SPAN_ITER > > + select IOMMU_API > > + default n > > + help > > + Provides /dev/iommu the user API to control the IOMMU subsystem as > > + it relates to managing IO page tables that point at user space memory. > nit: missing ',' after /dev/iommu or Provides /dev/iommu user API Done > > +/** > > + * iommufd_ref_to_users() - Switch from destroy_rwsem to users refcount > > + * protection > > + * @obj - Object to release > > + * > > + * Objects have two refcount protections (destroy_rwsem and the refcount_t > > + * users). Holding either of these will prevent the object from being destroyed. > > + * > > + * Depending on the use case, one protection or the other is appropriate. In > > + * most cases references are being protected by the destroy_rwsem. This allows > > + * orderly destruction of the object because iommufd_object_destroy_user() will > > + * wait for it to become unlocked. However, as a rwsem, it cannot be held across > > + * a system call return. So cases that have longer term needs must switch > > + * to the weaker users refcount_t. > > + * > > + * With users protection iommufd_object_destroy_user() will return -EBUSY to > > iommufd_object_destroy_user() returns false and iommufd_destroy > retruns -EBUSY. "" * With users protection iommufd_object_destroy_user() will return false, * refusing to destroy the object, causing -EBUSY to userspace. */ "" > > > + * userspace and refuse to destroy the object. > > + */ > > +static inline void iommufd_ref_to_users(struct iommufd_object *obj) > > +{ > > + up_read(&obj->destroy_rwsem); > > + /* iommufd_lock_obj() obtains users as well */ > Do you have a way to check that put() is done in accordance, ie. we are > not going to try up_read() the rwsem if this latter is not used anymore? If put becomes unbalanced then fd closure will WARN_ON If someone misuses the rwsem (eg double up_reading it) then lockdep will fire > > +static int iommufd_fops_release(struct inode *inode, struct file *filp) > > +{ > > + struct iommufd_ctx *ictx = filp->private_data; > > + struct iommufd_object *obj; > > + > > + /* Destroy the graph from depth first */ > I would suggest: destroy the leaf objects first thanks to the > hierarchical user ref counting? or something alike "depth first" is a technical term when working with graphs.. How about replacing both comments with this: /* * The objects in the xarray form a graph of "users" counts, and we have * to destroy them in a depth first manner. Leaf objects will reduce the * users count of interior objects when they are destroyed. * * Repeatedly destroying all the "1 users" leaf objects will progress * until the entire list is destroyed. If this can't progress then there * is some bug related to object refcounting. */ > > + while (!xa_empty(&ictx->objects)) { > > + unsigned int destroyed = 0; > > + unsigned long index; > > + > > + xa_for_each(&ictx->objects, index, obj) { > > + /* > > + * Since we are in release elevated users must come from > > + * other objects holding the users. We will eventually > the sentense sounds a bit cryptic to me. > > + * destroy the object that holds this one and the next > > + * pass will progress it. > > + */ > > + if (!refcount_dec_if_one(&obj->users)) > > + continue; > > + destroyed++; > > + xa_erase(&ictx->objects, index); > > + iommufd_object_ops[obj->type].destroy(obj); > > + kfree(obj); > > Use iommufd_object_abort_and_destroy(obj) instead of the above 3 lines? Ah, they are not quite the same things, the order is different and abort has a protective assertion that the xa_array hasn't been messed with. It would be messy to merge them It is also very similar to iommufd_object_destroy_user() except we shortcut some unncessary locking. > > +/** > > + * DOC: General ioctl format > > + * > > + * The ioctl interface follows a general format to allow for extensibility. Each > > + * ioctl is passed in a structure pointer as the argument providing the size of > > + * the structure in the first u32. The kernel checks that any structure space > > + * beyond what it understands is 0. This allows userspace to use the backward > > + * compatible portion while consistently using the newer, larger, structures. > > + * > > + * ioctls use a standard meaning for common errnos: > > + * > > + * - ENOTTY: The IOCTL number itself is not supported at all > > + * - E2BIG: The IOCTL number is supported, but the provided structure has > > + * non-zero in a part the kernel does not understand. > > + * - EOPNOTSUPP: The IOCTL number is supported, and the structure is > > + * understood, however a known field has a value the kernel does not > > + * understand or support. > > + * - EINVAL: Everything about the IOCTL was understood, but a field is not > > + * correct. > > + * - ENOENT: An ID or IOVA provided does not exist. > > + * - ENOMEM: Out of memory. > > + * - EOVERFLOW: Mathematics oveflowed. > overflowed Done Thanks, Jason