> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Thursday, November 10, 2022 3:53 AM > > On Wed, Nov 09, 2022 at 10:18:09AM -0700, Alex Williamson wrote: > > > DPDK supports no-iommu mode. > > Er? Huh? How? I thought no-iommu was for applications that didn't do > DMA? How is DPDK getting packets in/out without DMA? I guess it snoops > in /proc/ or something to learn PFNs of mlock'd memory? <shudder> iirc dpdk started with UIO plus various tricks (root privilege, hugepage, etc.) to lock and learn PFN's from pagemap. Then when migrating it to vfio the no-iommu option was introduced to provide UIO compatibility. > > > I agree that it's very useful for testing, I'm certainly not suggesting > > to give up, but I'm not sure where no-iommu lives when iommufd owns > > /dev/vfio/vfio. Given the unsafe interrupts discussion, it doesn't > > seem like the type of thing that would be a priority for iommufd. > > Ah, I think the bit below will do the job, I'm not sure without doing > some testing (and I don't think I have the necessary Intel NIC for > DPDK). The basic point is no-iommu literally means 'do not use iommufd > at all, do not create an iommufd_device or an iommufd_access'. VFIO > can easially do that on its own. > > The only slightly messy bit is that uAPI requires the SET_CONTAINER > which we can just NOP in vfio_compat. With more checks it could have > higher fidelity of error cases, but not sure it is worth it. > > When we talk about the device cdev flow then I suggest that no-iommu > simply requires -1 for the iommufd during bind - ie no iommufd is > used or accepted and that is how the userspace knows/confirms it is in > no-iommu mode. > > > We're on a path where vfio accepts an iommufd as a container, and > > ultimately iommufd becomes the container provider, supplanting the > > IOMMU driver registration aspect of vfio. I absolutely want type1 and > > spapr backends to get replaced by iommufd, but reluctance to support > > aspects of vfio "legacy" behavior doesn't give me warm fuzzies about a > > wholesale hand-off of the container to a different subsystem, for > > example vs an iommufd shim spoofing type1 support. > > Well, I will agree to do everything required, just let's go through the > process to understand the security situation and ensure we are still > doing things the right way. > > > Unfortunately we no longer have a CONFIG_EXPERIMENTAL option to hide > > behind for disabling VFIO_CONTAINER, so regardless of our intentions > > that a transition is some time off, it may become an issue sooner than > > we expect. > > We can add kconfig text discouraging that? > > diff --git a/drivers/iommu/iommufd/vfio_compat.c > b/drivers/iommu/iommufd/vfio_compat.c > index dbef3274803336..81f7594cfff8e0 100644 > --- a/drivers/iommu/iommufd/vfio_compat.c > +++ b/drivers/iommu/iommufd/vfio_compat.c > @@ -259,6 +259,14 @@ static int iommufd_vfio_set_iommu(struct > iommufd_ctx *ictx, unsigned long type) > struct iommufd_ioas *ioas = NULL; > int rc = 0; > > + /* > + * Emulation for NOIMMU is imperfect in that VFIO blocks almost all > + * other ioctls. We let them keep working but they mostly fail since no > + * IOAS should exist. > + */ > + if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) && type == > VFIO_NOIOMMU_IOMMU) > + return 0; > + > if (type != VFIO_TYPE1_IOMMU && type != VFIO_TYPE1v2_IOMMU) > return -EINVAL; > also need a check in iommufd_vfio_check_extension() so only VFIO_NOIOMMU_IOMMU is supported in no-iommu mode. > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c > index 595c7b2146f88c..64a336ebc99b9f 100644 > --- a/drivers/vfio/iommufd.c > +++ b/drivers/vfio/iommufd.c > @@ -18,6 +18,10 @@ int vfio_iommufd_bind(struct vfio_device *vdev, > struct iommufd_ctx *ictx) > > lockdep_assert_held(&vdev->dev_set->lock); > > + if (IS_ENABLED(CONFIG_VFIO_NO_IOMMU) && > + vdev->group->type == VFIO_NO_IOMMU) > + return 0; > + > /* > * If the driver doesn't provide this op then it means the device does > * not do DMA at all. So nothing to do. > @@ -53,6 +57,10 @@ void vfio_iommufd_unbind(struct vfio_device *vdev) > { > lockdep_assert_held(&vdev->dev_set->lock); > > + if (IS_ENABLED(CONFIG_VFIO_NO_IOMMU) && > + vdev->group->type == VFIO_NO_IOMMU) > + return; > + > if (vdev->ops->unbind_iommufd) > vdev->ops->unbind_iommufd(vdev); > } > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c > index f3c48b8c45627d..08c5b05a129c2c 100644 > --- a/drivers/vfio/vfio_main.c > +++ b/drivers/vfio/vfio_main.c > @@ -749,10 +749,13 @@ static int vfio_group_ioctl_set_container(struct > vfio_group *group, > if (!IS_ERR(iommufd)) { > u32 ioas_id; > > - ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id); > - if (ret) { > - iommufd_ctx_put(group->iommufd); > - goto out_unlock; > + if (!IS_ENABLED(CONFIG_VFIO_NO_IOMMU) || > + group->type != VFIO_NO_IOMMU) { > + ret = iommufd_vfio_compat_ioas_id(iommufd, > &ioas_id); > + if (ret) { > + iommufd_ctx_put(group->iommufd); > + goto out_unlock; > + } > } with above I suppose other ioctls (map/unmap/etc.) are implicitly blocked given get_compat_ioas() will fail in those paths. this is good. btw vfio container requires exact match between group->type and container->noiommu, i.e. noiommu group can be only attached to noiommu container. this is another thing to be paired up.