Hi Jason, On 11/18/22 21:23, Jason Gunthorpe wrote: > On Fri, Nov 18, 2022 at 05:27:35PM +0100, Eric Auger wrote: >>> +config IOMMUFD >>> + tristate "IOMMU Userspace API" >>> + select INTERVAL_TREE >>> + select INTERVAL_TREE_SPAN_ITER >>> + select IOMMU_API >>> + default n >>> + help >>> + Provides /dev/iommu the user API to control the IOMMU subsystem as >>> + it relates to managing IO page tables that point at user space memory. >> nit: missing ',' after /dev/iommu or Provides /dev/iommu user API > Done > >>> +/** >>> + * iommufd_ref_to_users() - Switch from destroy_rwsem to users refcount >>> + * protection >>> + * @obj - Object to release >>> + * >>> + * Objects have two refcount protections (destroy_rwsem and the refcount_t >>> + * users). Holding either of these will prevent the object from being destroyed. >>> + * >>> + * Depending on the use case, one protection or the other is appropriate. In >>> + * most cases references are being protected by the destroy_rwsem. This allows >>> + * orderly destruction of the object because iommufd_object_destroy_user() will >>> + * wait for it to become unlocked. However, as a rwsem, it cannot be held across >>> + * a system call return. So cases that have longer term needs must switch >>> + * to the weaker users refcount_t. >>> + * >>> + * With users protection iommufd_object_destroy_user() will return -EBUSY to >> iommufd_object_destroy_user() returns false and iommufd_destroy >> retruns -EBUSY. > "" > * With users protection iommufd_object_destroy_user() will return false, > * refusing to destroy the object, causing -EBUSY to userspace. > */ > "" > >>> + * userspace and refuse to destroy the object. >>> + */ >>> +static inline void iommufd_ref_to_users(struct iommufd_object *obj) >>> +{ >>> + up_read(&obj->destroy_rwsem); >>> + /* iommufd_lock_obj() obtains users as well */ >> Do you have a way to check that put() is done in accordance, ie. we are >> not going to try up_read() the rwsem if this latter is not used anymore? > If put becomes unbalanced then fd closure will WARN_ON > > If someone misuses the rwsem (eg double up_reading it) then lockdep > will fire OK > >>> +static int iommufd_fops_release(struct inode *inode, struct file *filp) >>> +{ >>> + struct iommufd_ctx *ictx = filp->private_data; >>> + struct iommufd_object *obj; >>> + >>> + /* Destroy the graph from depth first */ >> I would suggest: destroy the leaf objects first thanks to the >> hierarchical user ref counting? or something alike > "depth first" is a technical term when working with graphs.. OK. I ignored that. > > How about replacing both comments with this: > > /* > * The objects in the xarray form a graph of "users" counts, and we have > * to destroy them in a depth first manner. Leaf objects will reduce the > * users count of interior objects when they are destroyed. > * > * Repeatedly destroying all the "1 users" leaf objects will progress > * until the entire list is destroyed. If this can't progress then there > * is some bug related to object refcounting. > */ Yes that looks much clearer to me. Thanks! > >>> + while (!xa_empty(&ictx->objects)) { >>> + unsigned int destroyed = 0; >>> + unsigned long index; >>> + >>> + xa_for_each(&ictx->objects, index, obj) { >>> + /* >>> + * Since we are in release elevated users must come from >>> + * other objects holding the users. We will eventually >> the sentense sounds a bit cryptic to me. >>> + * destroy the object that holds this one and the next >>> + * pass will progress it. >>> + */ >>> + if (!refcount_dec_if_one(&obj->users)) >>> + continue; >>> + destroyed++; >>> + xa_erase(&ictx->objects, index); >>> + iommufd_object_ops[obj->type].destroy(obj); >>> + kfree(obj); >> Use iommufd_object_abort_and_destroy(obj) instead of the above 3 lines? > Ah, they are not quite the same things, the order is different and > abort has a protective assertion that the xa_array hasn't been messed > with. It would be messy to merge them > > It is also very similar to iommufd_object_destroy_user() except we > shortcut some unncessary locking. OK >>> +/** >>> + * DOC: General ioctl format >>> + * >>> + * The ioctl interface follows a general format to allow for extensibility. Each >>> + * ioctl is passed in a structure pointer as the argument providing the size of >>> + * the structure in the first u32. The kernel checks that any structure space >>> + * beyond what it understands is 0. This allows userspace to use the backward >>> + * compatible portion while consistently using the newer, larger, structures. >>> + * >>> + * ioctls use a standard meaning for common errnos: >>> + * >>> + * - ENOTTY: The IOCTL number itself is not supported at all >>> + * - E2BIG: The IOCTL number is supported, but the provided structure has >>> + * non-zero in a part the kernel does not understand. >>> + * - EOPNOTSUPP: The IOCTL number is supported, and the structure is >>> + * understood, however a known field has a value the kernel does not >>> + * understand or support. >>> + * - EINVAL: Everything about the IOCTL was understood, but a field is not >>> + * correct. >>> + * - ENOENT: An ID or IOVA provided does not exist. >>> + * - ENOMEM: Out of memory. >>> + * - EOVERFLOW: Mathematics oveflowed. >> overflowed > Done > > Thanks, > Jason > Thanks Eric