Hi Alex, I've two new patches to address the comment in this patch. If it makes sense then I'll put them in cdev v12. > From: Liu, Yi L <yi.l.liu@xxxxxxxxx> > Sent: Tuesday, May 23, 2023 10:13 AM > > > From: Alex Williamson <alex.williamson@xxxxxxxxxx> > > Sent: Tuesday, May 23, 2023 7:04 AM > > > > On Sat, 13 May 2023 06:28:25 -0700 > > Yi Liu <yi.l.liu@xxxxxxxxx> wrote: > > > > > This is to make the cdev path and group path consistent for the noiommu > > > devices registration. If vfio_noiommu is disabled, such registration > > > should fail. However, this check is vfio_device_set_group() which is part > > > of the vfio_group code. If the vfio_group code is compiled out, noiommu > > > devices would be registered even vfio_noiommu is disabled. > > > > > > This adds vfio_device_set_noiommu() which can fail and calls it in the > > > device registration. For now, it never fails as long as > > > vfio_device_set_group() is successful. But when the vfio_group code is > > > compiled out, vfio_device_set_noiommu() would fail the noiommu devices > > > when vfio_noiommu is disabled. > > > > I'm lost. After the next patch we end up with the following when > > CONFIG_VFIO_GROUP is set: > > > > static inline int vfio_device_set_noiommu(struct vfio_device *device) > > { > > device->noiommu = IS_ENABLED(CONFIG_VFIO_NOIOMMU) && > > device->group->type == VFIO_NO_IOMMU; > > return 0; > > } > > > > I think this is relying on the fact that vfio_device_set_group() which > > is called immediately prior to this function would have performed the > > testing for noiommu and failed prior to this function being called and > > therefore there is no error return here. > > Yes. Remove the IS_ENABLED(CONFIG_VFIO_NOIOMMU) check: >From 3e93d33dc426350389a89130557a212cf370fee6 Mon Sep 17 00:00:00 2001 From: Yi Liu <yi.l.liu@xxxxxxxxx> Date: Tue, 23 May 2023 20:48:08 -0700 Subject: [PATCH 19/23] vfio: Only check group->type for noiommu test group->type can be VFIO_NO_IOMMU only when vfio_noiommu option is true. And vfio_noiommu option can only be true if CONFIG_VFIO_NOIOMMU is enabled. So checking group->type is enough when testing noiommu. Signed-off-by: Yi Liu <yi.l.liu@xxxxxxxxx> --- drivers/vfio/group.c | 3 +-- drivers/vfio/vfio.h | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index c930406cc261..3b56959fcdbb 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -133,8 +133,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group, iommufd = iommufd_ctx_from_file(f.file); if (!IS_ERR(iommufd)) { - if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) && - group->type == VFIO_NO_IOMMU) + if (group->type == VFIO_NO_IOMMU) ret = iommufd_vfio_compat_set_no_iommu(iommufd); else ret = iommufd_vfio_compat_ioas_create(iommufd); diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index 0f66d0934e91..104c2ee93833 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -108,8 +108,7 @@ void vfio_group_cleanup(void); static inline bool vfio_device_is_noiommu(struct vfio_device *vdev) { - return IS_ENABLED(CONFIG_VFIO_NOIOMMU) && - vdev->group->type == VFIO_NO_IOMMU; + return vdev->group->type == VFIO_NO_IOMMU; } #if IS_ENABLED(CONFIG_VFIO_CONTAINER) -- 2.34.1 > > > > Note also here that I think CONFIG_VFIO_NOIOMMU was only being tested > > here previously so that a smart enough compiler would optimize out the > > entire function, we can never set a VFIO_NO_IOMMU type when > > !CONFIG_VFIO_NOIOMMU. > > You are right. VFIO_NO_IOMMU type is only set when vfio_noiommu==true. > > > That's no longer the case if the function is > > refactored like this. > > > > When !CONFIG_VFIO_GROUP: > > > > static inline int vfio_device_set_noiommu(struct vfio_device *device) > > { > > struct iommu_group *iommu_group; > > > > iommu_group = iommu_group_get(device->dev); > > if (!iommu_group) { > > if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu) > > return -EINVAL; > > device->noiommu = true; > > } else { > > iommu_group_put(iommu_group); > > device->noiommu = false; > > } > > > > return 0; > > } > > > > Here again, the NOIOMMU config option is irrelevant, vfio_noiommu can > > only be true if the config option is enabled. Therefore if there's no > > IOMMU group and the module option to enable noiommu is not set, return > > an error. > > Yes. I think I missed the part that the vfio_noiommu option can only be > true when CONFIG_VFIO_NOIOMMU is enabled. That's why the check > is "IS_ENABLED(CONFIG_VFIO_NOIOMMU) && device->group->type == > VFIO_NO_IOMMU;". > This appears that the two conditions are orthogonal. > > > > > It's really quite ugly that in one mode we rely on this function to > > generate the error and in the other mode it happens prior to getting > > here. > > > > The above could be simplified to something like: > > > > iommu_group = iommu_group_get(device->dev); > > if (!iommu_group && !vfio_iommu) > > return -EINVAL; > > > > device->noiommu = !iommu_group; > > iommu_group_put(iommu_group); /* Accepts NULL */ > > return 0; > > > > Which would actually work regardless of CONFIG_VFIO_GROUP, where maybe > > this could then be moved before vfio_device_set_group() and become the > > de facto exit point for invalid noiommu configurations and maybe we > > could remove the test from the group code (with a comment to note that > > it's been tested prior)? Thanks, > > Yes, this looks much clearer. I think this new logic must be called before > vfio_device_set_group(). Otherwise, iommu_group_get () may return > the faked groups. Then it would need to work differently per CONFIG_VFIO_GROUP > configuration. Below patch adopts your suggestion on noiommu determination. It also moves the noiommu taint in vfio_device_set_noiommu(). >From 757a168acca7e94beec4448fba4600155569d823 Mon Sep 17 00:00:00 2001 From: Yi Liu <yi.l.liu@xxxxxxxxx> Date: Tue, 9 May 2023 01:55:28 -0700 Subject: [PATCH 20/23] vfio: Determine noiommu device in __vfio_register_dev() This moves the noiommu device determination and noiommu taint out of vfio_group_find_or_alloc(). This is also helpful for compiling out vfio_group infrastructure when vfio device cdev is added as noiommu determination is common between the cdev path and group path. Suggested-by: Alex Williamson <alex.williamson@xxxxxxxxxx> Signed-off-by: Yi Liu <yi.l.liu@xxxxxxxxx> --- drivers/vfio/group.c | 18 +++--------------- drivers/vfio/vfio_main.c | 25 +++++++++++++++++++++++++ include/linux/vfio.h | 1 + 3 files changed, 29 insertions(+), 15 deletions(-) diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index 3b56959fcdbb..9ee6e70531d3 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -670,21 +670,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev) struct vfio_group *group; iommu_group = iommu_group_get(dev); - if (!iommu_group && vfio_noiommu) { - /* - * With noiommu enabled, create an IOMMU group for devices that - * don't already have one, implying no IOMMU hardware/driver - * exists. Taint the kernel because we're about to give a DMA - * capable device to a user without IOMMU protection. - */ - group = vfio_noiommu_group_alloc(dev, VFIO_NO_IOMMU); - if (!IS_ERR(group)) { - add_taint(TAINT_USER, LOCKDEP_STILL_OK); - dev_warn(dev, "Adding kernel taint for vfio-noiommu group on device\n"); - } - return group; - } - if (!iommu_group) return ERR_PTR(-EINVAL); @@ -720,6 +705,9 @@ int vfio_device_set_group(struct vfio_device *device, { struct vfio_group *group; + if (device->noiommu) + type = VFIO_NO_IOMMU; + if (type == VFIO_IOMMU) group = vfio_group_find_or_alloc(device->dev); else diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 20f463088e71..da46e2e74642 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -266,6 +266,18 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev, return ret; } +static int vfio_device_set_noiommu(struct vfio_device *device) +{ + struct iommu_group *iommu_group = iommu_group_get(device->dev); + + if (!iommu_group && !vfio_noiommu) + return -EINVAL; + + device->noiommu = !iommu_group; + iommu_group_put(iommu_group); /* Accepts NULL */ + return 0; +} + static int __vfio_register_dev(struct vfio_device *device, enum vfio_group_type type) { @@ -285,6 +297,10 @@ static int __vfio_register_dev(struct vfio_device *device, if (!device->dev_set) vfio_assign_device_set(device, device); + ret = vfio_device_set_noiommu(device); + if (ret) + return ret; + ret = vfio_device_set_group(device, type); if (ret) return ret; @@ -303,6 +319,15 @@ static int __vfio_register_dev(struct vfio_device *device, vfio_device_group_register(device); + if (device->noiommu) { + /* + * noiommu deivces have no IOMMU hardware/driver. Taint the + * kernel because we're about to give a DMA capable device to + * a user without IOMMU protection. + */ + add_taint(TAINT_USER, LOCKDEP_STILL_OK); + dev_warn(device->dev, "Adding kernel taint for vfio-noiommu on device\n"); + } return 0; err_out: vfio_device_remove_group(device); diff --git a/include/linux/vfio.h b/include/linux/vfio.h index e80a8ac86e46..183e620009e7 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -67,6 +67,7 @@ struct vfio_device { bool iommufd_attached; #endif bool cdev_opened:1; + bool noiommu:1; }; /** -- 2.34.1