On Tue, 13 Dec 2022 07:46:56 -0800 Steve Sistare <steven.sistare@xxxxxxxxxx> wrote: > When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR > interfaces, locked_vm of the new mm becomes 0. If the user later unmaps a > dma mapping, locked_vm underflows to a large unsigned value, and a > subsequent dma map request fails with ENOMEM in __account_locked_vm. > > To fix, when VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed, > add the mapping's pinned page count to the new mm->locked_vm, subject to > the rlimit. Now that mediated devices are excluded when using > VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of the > mapping. > > Underflow will not occur when all dma mappings are invalidated before exec. > An attempt to unmap before updating the vaddr with VFIO_DMA_MAP_FLAG_VADDR > will fail with EINVAL because the mapping is in the vaddr_invalid state. Where is this enforced? > Underflow may still occur in a buggy application that fails to invalidate > all before exec. > > Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx> > --- > drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index f81e925..e5a02f8 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -100,6 +100,7 @@ struct vfio_dma { > struct task_struct *task; > struct rb_root pfn_list; /* Ex-user pinned pfn list */ > unsigned long *bitmap; > + struct mm_struct *mm; > }; > > struct vfio_batch { > @@ -1174,6 +1175,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) > vfio_unmap_unpin(iommu, dma, true); > vfio_unlink_dma(iommu, dma); > put_task_struct(dma->task); > + mmdrop(dma->mm); > vfio_dma_bitmap_free(dma); > if (dma->vaddr_invalid) { > iommu->vaddr_invalid_count--; > @@ -1622,6 +1624,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > dma->vaddr = vaddr; > dma->vaddr_invalid = false; > iommu->vaddr_invalid_count--; > + if (current->mm != dma->mm) { > + mmdrop(dma->mm); > + dma->mm = current->mm; > + mmgrab(dma->mm); > + ret = vfio_lock_acct(dma, size >> PAGE_SHIFT, > + 0); What does it actually mean if this fails? The pages are still pinned. lock_vm doesn't get updated. Underflow can still occur. Thanks, Alex > + } > wake_up_all(&iommu->vaddr_wait); > } > goto out_unlock; > @@ -1679,6 +1688,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > get_task_struct(current->group_leader); > dma->task = current->group_leader; > dma->lock_cap = capable(CAP_IPC_LOCK); > + dma->mm = dma->task->mm; > + mmgrab(dma->mm); > > dma->pfn_list = RB_ROOT; >