On Tue, 13 Dec 2022 11:40:56 -0800 Steve Sistare <steven.sistare@xxxxxxxxxx> wrote: > When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR > interfaces, locked_vm of the new mm becomes 0. If the user later unmaps a > dma mapping, locked_vm underflows to a large unsigned value, and a > subsequent dma map request fails with ENOMEM in __account_locked_vm. > > To avoid underflow, do not decrement locked_vm during unmap if the > dma's mm has changed. To restore the correct locked_vm count, when > VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed, add > the mapping's pinned page count to the new mm->locked_vm, subject > to the rlimit. Now that mediated devices are excluded when using > VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of > the mapping. > Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr") > Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx> > --- > drivers/vfio/vfio_iommu_type1.c | 23 +++++++++++++++++++---- > 1 file changed, 19 insertions(+), 4 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 80bdb4d..35a1a52 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -100,6 +100,7 @@ struct vfio_dma { > struct task_struct *task; > struct rb_root pfn_list; /* Ex-user pinned pfn list */ > unsigned long *bitmap; > + struct mm_struct *mm; > }; > > struct vfio_batch { > @@ -1165,7 +1166,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, > &iotlb_gather); > } > > - if (do_accounting) { > + if (do_accounting && current->mm == dma->mm) { This seems incompatible with ffed0518d871 ("vfio: remove useless judgement") where we no longer assume that the unmap mm is the same as the mapping mm. Does this need to get_task_mm(dma->task) and compare that mm to dma->mm to determine whether an exec w/o vaddr remapping has occurred? That's the only use case I can figure out where grabbing the mm for dma->mm actually makes any sense at all. > vfio_lock_acct(dma, -unlocked, true); > return 0; > } > @@ -1178,6 +1179,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) > vfio_unmap_unpin(iommu, dma, true); > vfio_unlink_dma(iommu, dma); > put_task_struct(dma->task); > + mmdrop(dma->mm); > vfio_dma_bitmap_free(dma); > if (dma->vaddr_invalid) { > iommu->vaddr_invalid_count--; > @@ -1623,9 +1625,20 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > dma->size != size) { > ret = -EINVAL; > } else { > - dma->vaddr = vaddr; > - dma->vaddr_invalid = false; > - iommu->vaddr_invalid_count--; > + if (current->mm != dma->mm) { > + ret = vfio_lock_acct(dma, size >> PAGE_SHIFT, > + 0); > + if (!ret) { > + mmdrop(dma->mm); > + dma->mm = current->mm; > + mmgrab(dma->mm); > + } > + } > + if (!ret) { > + dma->vaddr = vaddr; > + dma->vaddr_invalid = false; > + iommu->vaddr_invalid_count--; > + } Poor flow, shouldn't this be: if (current->mm != dma->mm) { ret = vfio_lock_acct(dma, size >> PAGE_SHIFT, 0); if (ret) goto out_unlock; mmdrop(dma->mm); dma->mm = current->mm; mmgrab(dma->mm); } dma->vaddr = vaddr; dma->vaddr_invalid = false; iommu->vaddr_invalid_count--; Thanks, Alex > wake_up_all(&iommu->vaddr_wait); > } > goto out_unlock; > @@ -1683,6 +1696,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > get_task_struct(current->group_leader); > dma->task = current->group_leader; > dma->lock_cap = capable(CAP_IPC_LOCK); > + dma->mm = dma->task->mm; > + mmgrab(dma->mm); > > dma->pfn_list = RB_ROOT; >