On 12/13/2022 3:23 PM, Alex Williamson wrote: > On Tue, 13 Dec 2022 11:40:56 -0800 > Steve Sistare <steven.sistare@xxxxxxxxxx> wrote: > >> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR >> interfaces, locked_vm of the new mm becomes 0. If the user later unmaps a >> dma mapping, locked_vm underflows to a large unsigned value, and a >> subsequent dma map request fails with ENOMEM in __account_locked_vm. >> >> To avoid underflow, do not decrement locked_vm during unmap if the >> dma's mm has changed. To restore the correct locked_vm count, when >> VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed, add >> the mapping's pinned page count to the new mm->locked_vm, subject >> to the rlimit. Now that mediated devices are excluded when using >> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of >> the mapping. > > Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr") > > >> Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx> >> --- >> drivers/vfio/vfio_iommu_type1.c | 23 +++++++++++++++++++---- >> 1 file changed, 19 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >> index 80bdb4d..35a1a52 100644 >> --- a/drivers/vfio/vfio_iommu_type1.c >> +++ b/drivers/vfio/vfio_iommu_type1.c >> @@ -100,6 +100,7 @@ struct vfio_dma { >> struct task_struct *task; >> struct rb_root pfn_list; /* Ex-user pinned pfn list */ >> unsigned long *bitmap; >> + struct mm_struct *mm; >> }; >> >> struct vfio_batch { >> @@ -1165,7 +1166,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, >> &iotlb_gather); >> } >> >> - if (do_accounting) { >> + if (do_accounting && current->mm == dma->mm) { > > > This seems incompatible with ffed0518d871 ("vfio: remove useless > judgement") where we no longer assume that the unmap mm is the same as > the mapping mm. They are compatible. My fix allows another task to unmap, but only decreases locked_vm if the current mm matches the original mm that locked it. And the "original" mm is updated by MAP_FLAG_VADDR. > Does this need to get_task_mm(dma->task) and compare that mm to dma->mm > to determine whether an exec w/o vaddr remapping has occurred? That's > the only use case I can figure out where grabbing the mm for dma->mm > actually makes any sense at all. The mm grab does detect an exec. Before exec, at map time, we get task and grab its mm. During exec, task gets a new mm. The old mm becomes defunct, but we still hold it and can examine its pointer address. The new code does not require that current == dma->task. >> vfio_lock_acct(dma, -unlocked, true); >> return 0; >> } >> @@ -1178,6 +1179,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) >> vfio_unmap_unpin(iommu, dma, true); >> vfio_unlink_dma(iommu, dma); >> put_task_struct(dma->task); >> + mmdrop(dma->mm); >> vfio_dma_bitmap_free(dma); >> if (dma->vaddr_invalid) { >> iommu->vaddr_invalid_count--; >> @@ -1623,9 +1625,20 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, >> dma->size != size) { >> ret = -EINVAL; >> } else { >> - dma->vaddr = vaddr; >> - dma->vaddr_invalid = false; >> - iommu->vaddr_invalid_count--; >> + if (current->mm != dma->mm) { >> + ret = vfio_lock_acct(dma, size >> PAGE_SHIFT, >> + 0); >> + if (!ret) { >> + mmdrop(dma->mm); >> + dma->mm = current->mm; >> + mmgrab(dma->mm); >> + } >> + } >> + if (!ret) { >> + dma->vaddr = vaddr; >> + dma->vaddr_invalid = false; >> + iommu->vaddr_invalid_count--; >> + } > > Poor flow, shouldn't this be: > > if (current->mm != dma->mm) { > ret = vfio_lock_acct(dma, > size >> PAGE_SHIFT, 0); > if (ret) > goto out_unlock; > > mmdrop(dma->mm); > dma->mm = current->mm; > mmgrab(dma->mm); > } > dma->vaddr = vaddr; > dma->vaddr_invalid = false; > iommu->vaddr_invalid_count--; Better, will do, thanks. - Steve >> wake_up_all(&iommu->vaddr_wait); >> } >> goto out_unlock; >> @@ -1683,6 +1696,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, >> get_task_struct(current->group_leader); >> dma->task = current->group_leader; >> dma->lock_cap = capable(CAP_IPC_LOCK); >> + dma->mm = dma->task->mm; >> + mmgrab(dma->mm); >> >> dma->pfn_list = RB_ROOT; >> >