On 12/13/2022 1:02 PM, Alex Williamson wrote: > On Tue, 13 Dec 2022 07:46:56 -0800 > Steve Sistare <steven.sistare@xxxxxxxxxx> wrote: > >> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR >> interfaces, locked_vm of the new mm becomes 0. If the user later unmaps a >> dma mapping, locked_vm underflows to a large unsigned value, and a >> subsequent dma map request fails with ENOMEM in __account_locked_vm. >> >> To fix, when VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed, >> add the mapping's pinned page count to the new mm->locked_vm, subject to >> the rlimit. Now that mediated devices are excluded when using >> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of the >> mapping. >> >> Underflow will not occur when all dma mappings are invalidated before exec. >> An attempt to unmap before updating the vaddr with VFIO_DMA_MAP_FLAG_VADDR >> will fail with EINVAL because the mapping is in the vaddr_invalid state. > > Where is this enforced? In vfio_dma_do_unmap: if (invalidate_vaddr) { if (dma->vaddr_invalid) { ... ret = -EINVAL; >> Underflow may still occur in a buggy application that fails to invalidate >> all before exec. >> >> Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx> >> --- >> drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >> index f81e925..e5a02f8 100644 >> --- a/drivers/vfio/vfio_iommu_type1.c >> +++ b/drivers/vfio/vfio_iommu_type1.c >> @@ -100,6 +100,7 @@ struct vfio_dma { >> struct task_struct *task; >> struct rb_root pfn_list; /* Ex-user pinned pfn list */ >> unsigned long *bitmap; >> + struct mm_struct *mm; >> }; >> >> struct vfio_batch { >> @@ -1174,6 +1175,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) >> vfio_unmap_unpin(iommu, dma, true); >> vfio_unlink_dma(iommu, dma); >> put_task_struct(dma->task); >> + mmdrop(dma->mm); >> vfio_dma_bitmap_free(dma); >> if (dma->vaddr_invalid) { >> iommu->vaddr_invalid_count--; >> @@ -1622,6 +1624,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, >> dma->vaddr = vaddr; >> dma->vaddr_invalid = false; >> iommu->vaddr_invalid_count--; >> + if (current->mm != dma->mm) { >> + mmdrop(dma->mm); >> + dma->mm = current->mm; >> + mmgrab(dma->mm); >> + ret = vfio_lock_acct(dma, size >> PAGE_SHIFT, >> + 0); > > What does it actually mean if this fails? The pages are still pinned. > lock_vm doesn't get updated. Underflow can still occur. Thanks, If this fails, the user has locked additional memory after exec and before making this call -- more than was locked before exec -- and the rlimit is exceeded. A misbehaving application, which will only hurt itself. However, I should reorder these, and check ret before changing the other state. - Steve >> + } >> wake_up_all(&iommu->vaddr_wait); >> } >> goto out_unlock; >> @@ -1679,6 +1688,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, >> get_task_struct(current->group_leader); >> dma->task = current->group_leader; >> dma->lock_cap = capable(CAP_IPC_LOCK); >> + dma->mm = dma->task->mm; >> + mmgrab(dma->mm); >> >> dma->pfn_list = RB_ROOT; >> >