> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Tuesday, September 21, 2021 11:42 PM > > On Sun, Sep 19, 2021 at 02:38:29PM +0800, Liu Yi L wrote: > > /dev/iommu aims to provide a unified interface for managing I/O address > > spaces for devices assigned to userspace. This patch adds the initial > > framework to create a /dev/iommu node. Each open of this node returns an > > iommufd. And this fd is the handle for userspace to initiate its I/O > > address space management. > > > > One open: > > - We call this feature as IOMMUFD in Kconfig in this RFC. However this > > name is not clear enough to indicate its purpose to user. Back to 2010 > > vfio even introduced a /dev/uiommu [1] as the predecessor of its > > container concept. Is that a better name? Appreciate opinions here. > > > > [1] > https://lore.kernel.org/kvm/4c0eb470.1HMjondO00NIvFM6%25pugs@xxxxxxxx > m/ > > > > Signed-off-by: Liu Yi L <yi.l.liu@xxxxxxxxx> > > drivers/iommu/Kconfig | 1 + > > drivers/iommu/Makefile | 1 + > > drivers/iommu/iommufd/Kconfig | 11 ++++ > > drivers/iommu/iommufd/Makefile | 2 + > > drivers/iommu/iommufd/iommufd.c | 112 > ++++++++++++++++++++++++++++++++ > > 5 files changed, 127 insertions(+) > > create mode 100644 drivers/iommu/iommufd/Kconfig > > create mode 100644 drivers/iommu/iommufd/Makefile > > create mode 100644 drivers/iommu/iommufd/iommufd.c > > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > > index 07b7c25cbed8..a83ce0acd09d 100644 > > +++ b/drivers/iommu/Kconfig > > @@ -136,6 +136,7 @@ config MSM_IOMMU > > > > source "drivers/iommu/amd/Kconfig" > > source "drivers/iommu/intel/Kconfig" > > +source "drivers/iommu/iommufd/Kconfig" > > > > config IRQ_REMAP > > bool "Support for Interrupt Remapping" > > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile > > index c0fb0ba88143..719c799f23ad 100644 > > +++ b/drivers/iommu/Makefile > > @@ -29,3 +29,4 @@ obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o > > obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o > > obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o io-pgfault.o > > obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o > > +obj-$(CONFIG_IOMMUFD) += iommufd/ > > diff --git a/drivers/iommu/iommufd/Kconfig > b/drivers/iommu/iommufd/Kconfig > > new file mode 100644 > > index 000000000000..9fb7769a815d > > +++ b/drivers/iommu/iommufd/Kconfig > > @@ -0,0 +1,11 @@ > > +# SPDX-License-Identifier: GPL-2.0-only > > +config IOMMUFD > > + tristate "I/O Address Space management framework for passthrough > devices" > > + select IOMMU_API > > + default n > > + help > > + provides unified I/O address space management framework for > > + isolating untrusted DMAs via devices which are passed through > > + to userspace drivers. > > + > > + If you don't know what to do here, say N. > > diff --git a/drivers/iommu/iommufd/Makefile > b/drivers/iommu/iommufd/Makefile > > new file mode 100644 > > index 000000000000..54381a01d003 > > +++ b/drivers/iommu/iommufd/Makefile > > @@ -0,0 +1,2 @@ > > +# SPDX-License-Identifier: GPL-2.0-only > > +obj-$(CONFIG_IOMMUFD) += iommufd.o > > diff --git a/drivers/iommu/iommufd/iommufd.c > b/drivers/iommu/iommufd/iommufd.c > > new file mode 100644 > > index 000000000000..710b7e62988b > > +++ b/drivers/iommu/iommufd/iommufd.c > > @@ -0,0 +1,112 @@ > > +// SPDX-License-Identifier: GPL-2.0-only > > +/* > > + * I/O Address Space Management for passthrough devices > > + * > > + * Copyright (C) 2021 Intel Corporation > > + * > > + * Author: Liu Yi L <yi.l.liu@xxxxxxxxx> > > + */ > > + > > +#define pr_fmt(fmt) "iommufd: " fmt > > + > > +#include <linux/file.h> > > +#include <linux/fs.h> > > +#include <linux/module.h> > > +#include <linux/slab.h> > > +#include <linux/miscdevice.h> > > +#include <linux/mutex.h> > > +#include <linux/iommu.h> > > + > > +/* Per iommufd */ > > +struct iommufd_ctx { > > + refcount_t refs; > > +}; > > A private_data of a struct file should avoid having a refcount (and > this should have been a kref anyhow) > > Use the refcount on the struct file instead. > > In general the lifetime models look overly convoluted to me with > refcounts being used as locks and going in all manner of directions. > > - No refcount on iommufd_ctx, this should use the fget on the fd. > The driver facing version of the API has the driver holds a fget > inside the iommufd_device. > > - Put a rwlock inside the iommufd_ioas that is a > 'destroying_lock'. The rwlock starts out unlocked. > > Acquire from the xarray is > rcu_lock() > ioas = xa_load() > if (ioas) > if (down_read_trylock(&ioas->destroying_lock)) all good suggestions, will refine accordingly. Here destroying_lock is a rw_semaphore. right? Since down_read_trylock() accepts a rwsem. Thanks, Yi Liu > // success > Unacquire is just up_read() > > Do down_write when the ioas is to be destroyed, do not return ebusy. > > - Delete the iommufd_ctx->lock. Use RCU to protect load, erase/alloc does > not need locking (order it properly too, it is in the wrong order), and > don't check for duplicate devices or dev_cookie duplication, that > is user error and is harmless to the kernel. > > > +static int iommufd_fops_release(struct inode *inode, struct file *filep) > > +{ > > + struct iommufd_ctx *ictx = filep->private_data; > > + > > + filep->private_data = NULL; > > unnecessary > > > + iommufd_ctx_put(ictx); > > + > > + return 0; > > +} > > + > > +static long iommufd_fops_unl_ioctl(struct file *filep, > > + unsigned int cmd, unsigned long arg) > > +{ > > + struct iommufd_ctx *ictx = filep->private_data; > > + long ret = -EINVAL; > > + > > + if (!ictx) > > + return ret; > > impossible > > > + > > + switch (cmd) { > > + default: > > + pr_err_ratelimited("unsupported cmd %u\n", cmd); > > don't log user triggerable events > > Jason