On 1/3/2023 2:20 PM, Jason Gunthorpe wrote: > On Tue, Jan 03, 2023 at 01:12:53PM -0500, Steven Sistare wrote: >> On 1/3/2023 10:20 AM, Jason Gunthorpe wrote: >>> On Tue, Dec 20, 2022 at 12:39:20PM -0800, Steve Sistare wrote: >>>> When a vfio container is preserved across exec, the task does not change, >>>> but it gets a new mm with locked_vm=0, and loses the count from existing >>>> dma mappings. If the user later unmaps a dma mapping, locked_vm underflows >>>> to a large unsigned value, and a subsequent dma map request fails with >>>> ENOMEM in __account_locked_vm. >>>> >>>> To avoid underflow, grab and save the mm at the time a dma is mapped. >>>> Use that mm when adjusting locked_vm, rather than re-acquiring the saved >>>> task's mm, which may have changed. If the saved mm is dead, do nothing. >>>> >>>> locked_vm is incremented for existing mappings in a subsequent patch. >>>> >>>> Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation") >>>> Cc: stable@xxxxxxxxxxxxxxx >>>> Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx> >>>> Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx> >>>> --- >>>> drivers/vfio/vfio_iommu_type1.c | 27 +++++++++++---------------- >>>> 1 file changed, 11 insertions(+), 16 deletions(-) >>>> >>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >>>> index 144f5bb..71f980b 100644 >>>> --- a/drivers/vfio/vfio_iommu_type1.c >>>> +++ b/drivers/vfio/vfio_iommu_type1.c >>>> @@ -100,6 +100,7 @@ struct vfio_dma { >>>> struct task_struct *task; >>>> struct rb_root pfn_list; /* Ex-user pinned pfn list */ >>>> unsigned long *bitmap; >>>> + struct mm_struct *mm; >>>> }; >>>> >>>> struct vfio_batch { >>>> @@ -420,8 +421,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) >>>> if (!npage) >>>> return 0; >>>> >>>> - mm = async ? get_task_mm(dma->task) : dma->task->mm; >>>> - if (!mm) >>>> + mm = dma->mm; >>>> + if (async && !mmget_not_zero(mm)) >>>> return -ESRCH; /* process exited */ >>> >>> Just delete the async, the lock_acct always acts on the dma which >>> always has a singular mm. >>> >>> FIx the few callers that need it to do the mmget_no_zero() before >>> calling in. >> >> Most of the callers pass async=true: >> ret = vfio_lock_acct(dma, lock_acct, false); >> vfio_lock_acct(dma, locked - unlocked, true); >> ret = vfio_lock_acct(dma, 1, true); >> vfio_lock_acct(dma, -unlocked, true); >> vfio_lock_acct(dma, -1, true); >> vfio_lock_acct(dma, -unlocked, true); >> ret = mm_lock_acct(task, mm, lock_cap, npage, false); >> mm_lock_acct(dma->task, dma->mm, dma->lock_cap, -npage, true); >> vfio_lock_acct(dma, locked - unlocked, true); > > Seems like if you make a lock_sub_acct() function that does the -1* > and does the mmget it will be OK? Do you mean, provide two versions of vfio_lock_acct? Simplified: vfio_lock_acct() { mm_lock_acct() dma->locked_vm += npage; } vfio_lock_acct_async() { mmget_not_zero(dma->mm) mm_lock_acct() dma->locked_vm += npage; mmput(dma->mm); } If so, I will factor this into a separate patch, since it is cosmetic, so stable can omit it. - Steve