On 12/13/2022 12:31 PM, Alex Williamson wrote: > On Tue, 13 Dec 2022 11:54:33 -0500 > Steven Sistare <steven.sistare@xxxxxxxxxx> wrote: > >> On 12/13/2022 11:26 AM, Alex Williamson wrote: >>> On Tue, 13 Dec 2022 07:46:55 -0800 >>> Steve Sistare <steven.sistare@xxxxxxxxxx> wrote: >>> >>>> Disable the VFIO_UPDATE_VADDR capability if mediated devices are present. >>>> Their kernel threads could be blocked indefinitely by a misbehaving >>>> userland while trying to pin/unpin pages while vaddrs are being updated. >>> >>> Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr") >>> >>>> Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx> >>>> --- >>>> drivers/vfio/vfio_iommu_type1.c | 25 ++++++++++++++++++++++++- >>>> include/uapi/linux/vfio.h | 6 +++++- >>>> 2 files changed, 29 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >>>> index 23c24fe..f81e925 100644 >>>> --- a/drivers/vfio/vfio_iommu_type1.c >>>> +++ b/drivers/vfio/vfio_iommu_type1.c >>>> @@ -1343,6 +1343,10 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, >>>> >>>> mutex_lock(&iommu->lock); >>>> >>>> + /* Cannot update vaddr if mdev is present. */ >>>> + if (invalidate_vaddr && !list_empty(&iommu->emulated_iommu_groups)) >>>> + goto unlock; >>>> + >>>> pgshift = __ffs(iommu->pgsize_bitmap); >>>> pgsize = (size_t)1 << pgshift; >>>> >>>> @@ -2189,6 +2193,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, >>>> >>>> mutex_lock(&iommu->lock); >>>> >>>> + /* Prevent an mdev from sneaking in while vaddr flags are used. */ >>>> + if (iommu->vaddr_invalid_count && type == VFIO_EMULATED_IOMMU) >>>> + goto out_unlock; >>> >>> Why only mdev devices? If we restrict that the user cannot attach a >>> group while there are invalid vaddrs, and the pin/unpin pages and >>> dma_rw interfaces are restricted to cases where vaddr_invalid_count is >>> zero, then we can get rid of all the code to handle waiting for vaddrs. >>> ie. we could still revert: >>> >>> 898b9eaeb3fe ("vfio/type1: block on invalid vaddr") >>> 487ace134053 ("vfio/type1: implement notify callback") >>> ec5e32940cc9 ("vfio: iommu driver notify callback") >>> >>> It appears to me it might be easiest to lead with a clean revert of >>> these, then follow-up imposing the usage restrictions, and I'd go ahead >>> and add WARN_ON error paths to the pin/unpin/dma_rw paths to make sure >>> nobody enters those paths with an elevated invalid count. Thanks, >> >> Will do. I think I will put the revert at the end, though, as dead code >> clean up. That patch will be larger, and if it is judged to be too large >> for stable, it can be omitted from stable. >> >> - Steve >> >>>> + >>>> /* Check for duplicates */ >>>> if (vfio_iommu_find_iommu_group(iommu, iommu_group)) >>>> goto out_unlock; >>>> @@ -2660,6 +2668,20 @@ static int vfio_domains_have_enforce_cache_coherency(struct vfio_iommu *iommu) >>>> return ret; >>>> } >>>> >>>> +/* >>>> + * Disable this feature if mdevs are present. They cannot safely pin/unpin >>>> + * while vaddrs are being updated. >>>> + */ >>>> +static int vfio_iommu_can_update_vaddr(struct vfio_iommu *iommu) >>>> +{ >>>> + int ret; >>>> + >>>> + mutex_lock(&iommu->lock); >>>> + ret = list_empty(&iommu->emulated_iommu_groups); >>>> + mutex_unlock(&iommu->lock); >>>> + return ret; >>>> +} > > I'd also keep this generic to what it's actually doing, ie. simply > reporting if emulated_iommu_groups are present, so it could be > something like: > > static bool vfio_iommu_has_emulated(struct vfio_iommu *iommu) OK. > OTOH, I'm not sure it actually makes sense to dynamically change > reported value, the IOMMU backend supports vaddr update, but there are > usage restrictions and there's no way that this test can't be racy w/ > other user actions. Yes, but VFIO_DMA_CC_IOMMU has the same issue and the same comment, "this capability is subject to change as groups are added or removed." > Does it have specific utility in userspace to test > for this immediately before a live-update, or would QEMU enable support > for the feature based only on some initial condition? Thanks, We always check immediately before live update. We also check at startup if only-cpr-capable is specified. Now the latter check will be done later in startup, after groups are added. - Steve >>>> + >>>> static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu, >>>> unsigned long arg) >>>> { >>>> @@ -2668,8 +2690,9 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu, >>>> case VFIO_TYPE1v2_IOMMU: >>>> case VFIO_TYPE1_NESTING_IOMMU: >>>> case VFIO_UNMAP_ALL: >>>> - case VFIO_UPDATE_VADDR: >>>> return 1; >>>> + case VFIO_UPDATE_VADDR: >>>> + return iommu && vfio_iommu_can_update_vaddr(iommu); >>>> case VFIO_DMA_CC_IOMMU: >>>> if (!iommu) >>>> return 0; >>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h >>>> index d7d8e09..6d36b84 100644 >>>> --- a/include/uapi/linux/vfio.h >>>> +++ b/include/uapi/linux/vfio.h >>>> @@ -49,7 +49,11 @@ >>>> /* Supports VFIO_DMA_UNMAP_FLAG_ALL */ >>>> #define VFIO_UNMAP_ALL 9 >>>> >>>> -/* Supports the vaddr flag for DMA map and unmap */ >>>> +/* >>>> + * Supports the vaddr flag for DMA map and unmap. Not supported for mediated >>>> + * devices, so this capability is subject to change as groups are added or >>>> + * removed. >>>> + */ >>>> #define VFIO_UPDATE_VADDR 10 >>>> >>>> /* >>> >> >