On Tue, 13 Dec 2022 16:01:21 -0500 Steven Sistare <steven.sistare@xxxxxxxxxx> wrote: > On 12/13/2022 3:23 PM, Alex Williamson wrote: > > On Tue, 13 Dec 2022 11:40:56 -0800 > > Steve Sistare <steven.sistare@xxxxxxxxxx> wrote: > > > >> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR > >> interfaces, locked_vm of the new mm becomes 0. If the user later unmaps a > >> dma mapping, locked_vm underflows to a large unsigned value, and a > >> subsequent dma map request fails with ENOMEM in __account_locked_vm. > >> > >> To avoid underflow, do not decrement locked_vm during unmap if the > >> dma's mm has changed. To restore the correct locked_vm count, when > >> VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed, add > >> the mapping's pinned page count to the new mm->locked_vm, subject > >> to the rlimit. Now that mediated devices are excluded when using > >> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of > >> the mapping. > > > > Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr") > > > > > >> Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx> > >> --- > >> drivers/vfio/vfio_iommu_type1.c | 23 +++++++++++++++++++---- > >> 1 file changed, 19 insertions(+), 4 deletions(-) > >> > >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > >> index 80bdb4d..35a1a52 100644 > >> --- a/drivers/vfio/vfio_iommu_type1.c > >> +++ b/drivers/vfio/vfio_iommu_type1.c > >> @@ -100,6 +100,7 @@ struct vfio_dma { > >> struct task_struct *task; > >> struct rb_root pfn_list; /* Ex-user pinned pfn list */ > >> unsigned long *bitmap; > >> + struct mm_struct *mm; > >> }; > >> > >> struct vfio_batch { > >> @@ -1165,7 +1166,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, > >> &iotlb_gather); > >> } > >> > >> - if (do_accounting) { > >> + if (do_accounting && current->mm == dma->mm) { > > > > > > This seems incompatible with ffed0518d871 ("vfio: remove useless > > judgement") where we no longer assume that the unmap mm is the same as > > the mapping mm. > > They are compatible. My fix allows another task to unmap, but only decreases > locked_vm if the current mm matches the original mm that locked it. And the > "original" mm is updated by MAP_FLAG_VADDR. It seems like there's either a bug fix or behavioral change to ffed0518d871 then. What mm were we previously accounting in their fork/exec scenario that we're not with this change? > > Does this need to get_task_mm(dma->task) and compare that mm to dma->mm > > to determine whether an exec w/o vaddr remapping has occurred? That's > > the only use case I can figure out where grabbing the mm for dma->mm > > actually makes any sense at all. > > The mm grab does detect an exec. Before exec, at map time, we get task and grab > its mm. During exec, task gets a new mm. The old mm becomes defunct, but we > still hold it and can examine its pointer address. This is describing exactly the test I'm asking about, if dma->task->mm no longer matches dma->mm then an exec has occurred w/o a subsequent vaddr remap. So why are we bringing current->mm into the equation? Thanks, Alex > The new code does not require that current == dma->task. > > >> vfio_lock_acct(dma, -unlocked, true); > >> return 0; > >> } > >> @@ -1178,6 +1179,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) > >> vfio_unmap_unpin(iommu, dma, true); > >> vfio_unlink_dma(iommu, dma); > >> put_task_struct(dma->task); > >> + mmdrop(dma->mm); > >> vfio_dma_bitmap_free(dma); > >> if (dma->vaddr_invalid) { > >> iommu->vaddr_invalid_count--; > >> @@ -1623,9 +1625,20 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > >> dma->size != size) { > >> ret = -EINVAL; > >> } else { > >> - dma->vaddr = vaddr; > >> - dma->vaddr_invalid = false; > >> - iommu->vaddr_invalid_count--; > >> + if (current->mm != dma->mm) { > >> + ret = vfio_lock_acct(dma, size >> PAGE_SHIFT, > >> + 0); > >> + if (!ret) { > >> + mmdrop(dma->mm); > >> + dma->mm = current->mm; > >> + mmgrab(dma->mm); > >> + } > >> + } > >> + if (!ret) { > >> + dma->vaddr = vaddr; > >> + dma->vaddr_invalid = false; > >> + iommu->vaddr_invalid_count--; > >> + } > > > > Poor flow, shouldn't this be: > > > > if (current->mm != dma->mm) { > > ret = vfio_lock_acct(dma, > > size >> PAGE_SHIFT, 0); > > if (ret) > > goto out_unlock; > > > > mmdrop(dma->mm); > > dma->mm = current->mm; > > mmgrab(dma->mm); > > } > > dma->vaddr = vaddr; > > dma->vaddr_invalid = false; > > iommu->vaddr_invalid_count--; > > Better, will do, thanks. > > - Steve > > >> wake_up_all(&iommu->vaddr_wait); > >> } > >> goto out_unlock; > >> @@ -1683,6 +1696,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > >> get_task_struct(current->group_leader); > >> dma->task = current->group_leader; > >> dma->lock_cap = capable(CAP_IPC_LOCK); > >> + dma->mm = dma->task->mm; > >> + mmgrab(dma->mm); > >> > >> dma->pfn_list = RB_ROOT; > >> > > >