On Wed, Nov 09, 2022 at 10:18:09AM -0700, Alex Williamson wrote: > DPDK supports no-iommu mode. Er? Huh? How? I thought no-iommu was for applications that didn't do DMA? How is DPDK getting packets in/out without DMA? I guess it snoops in /proc/ or something to learn PFNs of mlock'd memory? <shudder> > I agree that it's very useful for testing, I'm certainly not suggesting > to give up, but I'm not sure where no-iommu lives when iommufd owns > /dev/vfio/vfio. Given the unsafe interrupts discussion, it doesn't > seem like the type of thing that would be a priority for iommufd. Ah, I think the bit below will do the job, I'm not sure without doing some testing (and I don't think I have the necessary Intel NIC for DPDK). The basic point is no-iommu literally means 'do not use iommufd at all, do not create an iommufd_device or an iommufd_access'. VFIO can easially do that on its own. The only slightly messy bit is that uAPI requires the SET_CONTAINER which we can just NOP in vfio_compat. With more checks it could have higher fidelity of error cases, but not sure it is worth it. When we talk about the device cdev flow then I suggest that no-iommu simply requires -1 for the iommufd during bind - ie no iommufd is used or accepted and that is how the userspace knows/confirms it is in no-iommu mode. > We're on a path where vfio accepts an iommufd as a container, and > ultimately iommufd becomes the container provider, supplanting the > IOMMU driver registration aspect of vfio. I absolutely want type1 and > spapr backends to get replaced by iommufd, but reluctance to support > aspects of vfio "legacy" behavior doesn't give me warm fuzzies about a > wholesale hand-off of the container to a different subsystem, for > example vs an iommufd shim spoofing type1 support. Well, I will agree to do everything required, just let's go through the process to understand the security situation and ensure we are still doing things the right way. > Unfortunately we no longer have a CONFIG_EXPERIMENTAL option to hide > behind for disabling VFIO_CONTAINER, so regardless of our intentions > that a transition is some time off, it may become an issue sooner than > we expect. We can add kconfig text discouraging that? diff --git a/drivers/iommu/iommufd/vfio_compat.c b/drivers/iommu/iommufd/vfio_compat.c index dbef3274803336..81f7594cfff8e0 100644 --- a/drivers/iommu/iommufd/vfio_compat.c +++ b/drivers/iommu/iommufd/vfio_compat.c @@ -259,6 +259,14 @@ static int iommufd_vfio_set_iommu(struct iommufd_ctx *ictx, unsigned long type) struct iommufd_ioas *ioas = NULL; int rc = 0; + /* + * Emulation for NOIMMU is imperfect in that VFIO blocks almost all + * other ioctls. We let them keep working but they mostly fail since no + * IOAS should exist. + */ + if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) && type == VFIO_NOIOMMU_IOMMU) + return 0; + if (type != VFIO_TYPE1_IOMMU && type != VFIO_TYPE1v2_IOMMU) return -EINVAL; diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c index 595c7b2146f88c..64a336ebc99b9f 100644 --- a/drivers/vfio/iommufd.c +++ b/drivers/vfio/iommufd.c @@ -18,6 +18,10 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx) lockdep_assert_held(&vdev->dev_set->lock); + if (IS_ENABLED(CONFIG_VFIO_NO_IOMMU) && + vdev->group->type == VFIO_NO_IOMMU) + return 0; + /* * If the driver doesn't provide this op then it means the device does * not do DMA at all. So nothing to do. @@ -53,6 +57,10 @@ void vfio_iommufd_unbind(struct vfio_device *vdev) { lockdep_assert_held(&vdev->dev_set->lock); + if (IS_ENABLED(CONFIG_VFIO_NO_IOMMU) && + vdev->group->type == VFIO_NO_IOMMU) + return; + if (vdev->ops->unbind_iommufd) vdev->ops->unbind_iommufd(vdev); } diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index f3c48b8c45627d..08c5b05a129c2c 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -749,10 +749,13 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group, if (!IS_ERR(iommufd)) { u32 ioas_id; - ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id); - if (ret) { - iommufd_ctx_put(group->iommufd); - goto out_unlock; + if (!IS_ENABLED(CONFIG_VFIO_NO_IOMMU) || + group->type != VFIO_NO_IOMMU) { + ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id); + if (ret) { + iommufd_ctx_put(group->iommufd); + goto out_unlock; + } } group->iommufd = iommufd;