On 1/3/2023 10:20 AM, Jason Gunthorpe wrote: > On Tue, Dec 20, 2022 at 12:39:20PM -0800, Steve Sistare wrote: >> When a vfio container is preserved across exec, the task does not change, >> but it gets a new mm with locked_vm=0, and loses the count from existing >> dma mappings. If the user later unmaps a dma mapping, locked_vm underflows >> to a large unsigned value, and a subsequent dma map request fails with >> ENOMEM in __account_locked_vm. >> >> To avoid underflow, grab and save the mm at the time a dma is mapped. >> Use that mm when adjusting locked_vm, rather than re-acquiring the saved >> task's mm, which may have changed. If the saved mm is dead, do nothing. >> >> locked_vm is incremented for existing mappings in a subsequent patch. >> >> Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation") >> Cc: stable@xxxxxxxxxxxxxxx >> Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx> >> Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx> >> --- >> drivers/vfio/vfio_iommu_type1.c | 27 +++++++++++---------------- >> 1 file changed, 11 insertions(+), 16 deletions(-) >> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >> index 144f5bb..71f980b 100644 >> --- a/drivers/vfio/vfio_iommu_type1.c >> +++ b/drivers/vfio/vfio_iommu_type1.c >> @@ -100,6 +100,7 @@ struct vfio_dma { >> struct task_struct *task; >> struct rb_root pfn_list; /* Ex-user pinned pfn list */ >> unsigned long *bitmap; >> + struct mm_struct *mm; >> }; >> >> struct vfio_batch { >> @@ -420,8 +421,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) >> if (!npage) >> return 0; >> >> - mm = async ? get_task_mm(dma->task) : dma->task->mm; >> - if (!mm) >> + mm = dma->mm; >> + if (async && !mmget_not_zero(mm)) >> return -ESRCH; /* process exited */ > > Just delete the async, the lock_acct always acts on the dma which > always has a singular mm. > > FIx the few callers that need it to do the mmget_no_zero() before > calling in. Most of the callers pass async=true: ret = vfio_lock_acct(dma, lock_acct, false); vfio_lock_acct(dma, locked - unlocked, true); ret = vfio_lock_acct(dma, 1, true); vfio_lock_acct(dma, -unlocked, true); vfio_lock_acct(dma, -1, true); vfio_lock_acct(dma, -unlocked, true); ret = mm_lock_acct(task, mm, lock_cap, npage, false); mm_lock_acct(dma->task, dma->mm, dma->lock_cap, -npage, true); vfio_lock_acct(dma, locked - unlocked, true); Hoisting mmget_not_zero and mmput will make many call sites messier. Alex, what is your opinion? >> @@ -794,8 +795,8 @@ static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr, >> struct mm_struct *mm; >> int ret; >> >> - mm = get_task_mm(dma->task); >> - if (!mm) >> + mm = dma->mm; >> + if (!mmget_not_zero(mm)) >> return -ENODEV; > > eg things get all confused here where we have the mmget_not_zero > > But then we go ahead and call vfio_lock_acct() with true Yes, I should pass false at this call site to optimize. - Steve