On Mon, Jul 01, 2024 at 03:54:19PM +0800, Yi Liu wrote: > On 2024/6/17 17:53, Yan Zhao wrote: > > Reuse file f_inode as vfio device inode and associate pseudo path file > > directly to inode allocated in vfio fs. > > > > Currently, vfio device is opened via 2 ways: > > 1) via cdev open > > vfio device is opened with a cdev device with file f_inode and address > > space associated with a cdev inode; > > 2) via VFIO_GROUP_GET_DEVICE_FD ioctl > > vfio device is opened via a pseudo path file with file f_inode and > > address space associated with an inode in anon_inode_fs. > > > > You can simply say the cdev path and group path. :) Ok, if they are well-known terms. > > > In commit b7c5e64fecfa ("vfio: Create vfio_fs_type with inode per device"), > > an inode in vfio fs is allocated for each vfio device. However, this inode > > in vfio fs is only used to assign its address space to that of a file > > associated with another cdev inode or an inode in anon_inode_fs. > > > > This patch > > - reuses cdev device inode as the vfio device inode when it's opened via > > cdev way; > > - allocates an inode in vfio fs, associate it to the pseudo path file, > > and save it as the vfio device inode when the vfio device is opened via > > VFIO_GROUP_GET_DEVICE_FD ioctl. > > So Alex's prior series only makes use of the i_mapping of the inode instead > of associating the inode with the pseudo path file? Right. > > File address space will then point automatically to the address space of > > the vfio device inode. Tools like unmap_mapping_range() can then zap all > > vmas associated with the vfio device. > > > > Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx> > > --- > > drivers/vfio/device_cdev.c | 9 ++++--- > > drivers/vfio/group.c | 21 ++-------------- > > drivers/vfio/vfio.h | 2 ++ > > drivers/vfio/vfio_main.c | 49 +++++++++++++++++++++++++++----------- > > 4 files changed, 43 insertions(+), 38 deletions(-) > > > > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c > > index bb1817bd4ff3..a4eec8e88f5c 100644 > > --- a/drivers/vfio/device_cdev.c > > +++ b/drivers/vfio/device_cdev.c > > @@ -40,12 +40,11 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep) > > filep->private_data = df; > > /* > > - * Use the pseudo fs inode on the device to link all mmaps > > - * to the same address space, allowing us to unmap all vmas > > - * associated to this device using unmap_mapping_range(). > > + * mmaps are linked to the address space of the inode of device cdev. > > + * Save the inode of device cdev in device->inode to allow > > + * unmap_mapping_range() to unmap all vmas. > > */ > > - filep->f_mapping = device->inode->i_mapping; > > - > > + device->inode = inode; > > return 0; > > err_put_registration: > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c > > index ded364588d29..aaef188003b6 100644 > > --- a/drivers/vfio/group.c > > +++ b/drivers/vfio/group.c > > @@ -268,31 +268,14 @@ static struct file *vfio_device_open_file(struct vfio_device *device) > > if (ret) > > goto err_free; > > - /* > > - * We can't use anon_inode_getfd() because we need to modify > > - * the f_mode flags directly to allow more than just ioctls > > - */ > > - filep = anon_inode_getfile("[vfio-device]", &vfio_device_fops, > > - df, O_RDWR); > > + filep = vfio_device_get_pseudo_file(device); > > if (IS_ERR(filep)) { > > ret = PTR_ERR(filep); > > goto err_close_device; > > } > > - > > - /* > > - * TODO: add an anon_inode interface to do this. > > - * Appears to be missing by lack of need rather than > > - * explicitly prevented. Now there's need. > > - */ > > + filep->private_data = df; > > filep->f_mode |= (FMODE_PREAD | FMODE_PWRITE); > > - /* > > - * Use the pseudo fs inode on the device to link all mmaps > > - * to the same address space, allowing us to unmap all vmas > > - * associated to this device using unmap_mapping_range(). > > - */ > > - filep->f_mapping = device->inode->i_mapping; > > - > > if (device->group->type == VFIO_NO_IOMMU) > > dev_warn(device->dev, "vfio-noiommu device opened by user " > > "(%s:%d)\n", current->comm, task_pid_nr(current)); > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h > > index 50128da18bca..1f8915f79fbb 100644 > > --- a/drivers/vfio/vfio.h > > +++ b/drivers/vfio/vfio.h > > @@ -35,6 +35,7 @@ struct vfio_device_file * > > vfio_allocate_device_file(struct vfio_device *device); > > extern const struct file_operations vfio_device_fops; > > +struct file *vfio_device_get_pseudo_file(struct vfio_device *device); > > #ifdef CONFIG_VFIO_NOIOMMU > > extern bool vfio_noiommu __read_mostly; > > @@ -420,6 +421,7 @@ static inline void vfio_cdev_cleanup(void) > > { > > } > > #endif /* CONFIG_VFIO_DEVICE_CDEV */ > > +struct file *vfio_device_get_pseduo_file(struct vfio_device *device); > > #if IS_ENABLED(CONFIG_VFIO_VIRQFD) > > int __init vfio_virqfd_init(void); > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c > > index a5a62d9d963f..e81d0f910c70 100644 > > --- a/drivers/vfio/vfio_main.c > > +++ b/drivers/vfio/vfio_main.c > > @@ -192,7 +192,6 @@ static void vfio_device_release(struct device *dev) > > if (device->ops->release) > > device->ops->release(device); > > - iput(device->inode); > > simple_release_fs(&vfio.vfs_mount, &vfio.fs_count); > > kvfree(device); > > } > > @@ -248,20 +247,50 @@ static struct file_system_type vfio_fs_type = { > > .kill_sb = kill_anon_super, > > }; > > -static struct inode *vfio_fs_inode_new(void) > > +/* > > + * Alloc pseudo file from inode associated of vfio.vfs_mount. > > nit: s/Alloc/Allocate/ and s/of/with/ > > > + * This is called when vfio device is opened via pseudo file. > > group path might be better. Is this pseudo file only needed for the device > files opened in the group path? If so, might be helpful to move the related > codes into group.c. Yes. I also planed to move the this to group.c in v2 :) > > > + * mmaps are linked to the address space of the inode of the pseudo file. > > + * Save the inode in device->inode for unmap_mapping_range() to unmap all vmas. > > + */ > > +struct file *vfio_device_get_pseudo_file(struct vfio_device *device) > > { > > + const struct file_operations *fops = &vfio_device_fops; > > struct inode *inode; > > + struct file *filep; > > int ret; > > + if (!fops_get(fops)) > > + return ERR_PTR(-ENODEV); > > + > > ret = simple_pin_fs(&vfio_fs_type, &vfio.vfs_mount, &vfio.fs_count); > > if (ret) > > - return ERR_PTR(ret); > > + goto err_pin_fs; > > inode = alloc_anon_inode(vfio.vfs_mount->mnt_sb); > > - if (IS_ERR(inode)) > > - simple_release_fs(&vfio.vfs_mount, &vfio.fs_count); > > + if (IS_ERR(inode)) { > > + ret = PTR_ERR(inode); > > + goto err_inode; > > + } > > + > > + filep = alloc_file_pseudo(inode, vfio.vfs_mount, "[vfio-device]", > > + O_RDWR, fops); > > + > > + if (IS_ERR(filep)) { > > + ret = PTR_ERR(filep); > > + goto err_file; > > + } > > + device->inode = inode; > > The group path allows multiple device fd get, hence this will set the > device->inode multiple times. It does not look good. Setting it once > should be enough? Will make the multipl opens to use the same vfio inode, as in [1]. [1] https://lore.kernel.org/all/Zn02BUdJ7kvOg6Vw@xxxxxxxxxxxxxxxxxxxxxxxxx/ > > > + return filep; > > + > > +err_file: > > + iput(inode); > > If the vfio_device_get_pseudo_file() succeeds, who will put inode? I > noticed all the other iput() of this file are removed. On success, the inode ref count is moved to the file and put in vfio_device_fops_release(). The prevous iput() is only required because that inode is not associated with any file. > > +err_inode: > > + simple_release_fs(&vfio.vfs_mount, &vfio.fs_count); > > +err_pin_fs: > > + fops_put(fops); > > - return inode; > > + return ERR_PTR(ret); > > } > > /* > > @@ -282,11 +311,6 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev, > > init_completion(&device->comp); > > device->dev = dev; > > device->ops = ops; > > - device->inode = vfio_fs_inode_new(); > > - if (IS_ERR(device->inode)) { > > - ret = PTR_ERR(device->inode); > > - goto out_inode; > > - } > > if (ops->init) { > > ret = ops->init(device); > > @@ -301,9 +325,6 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev, > > return 0; > > out_uninit: > > - iput(device->inode); > > - simple_release_fs(&vfio.vfs_mount, &vfio.fs_count); > > -out_inode: > > vfio_release_device_set(device); > > ida_free(&vfio.device_ida, device->index); > > return ret; > > > > base-commit: 6ba59ff4227927d3a8530fc2973b80e94b54d58f > > -- > Regards, > Yi Liu