On Fri, Jan 06, 2023 at 10:14:57AM -0500, Steven Sistare wrote: > On 1/3/2023 2:20 PM, Jason Gunthorpe wrote: > > On Tue, Jan 03, 2023 at 01:12:53PM -0500, Steven Sistare wrote: > >> On 1/3/2023 10:20 AM, Jason Gunthorpe wrote: > >>> On Tue, Dec 20, 2022 at 12:39:20PM -0800, Steve Sistare wrote: > >>>> When a vfio container is preserved across exec, the task does not change, > >>>> but it gets a new mm with locked_vm=0, and loses the count from existing > >>>> dma mappings. If the user later unmaps a dma mapping, locked_vm underflows > >>>> to a large unsigned value, and a subsequent dma map request fails with > >>>> ENOMEM in __account_locked_vm. > >>>> > >>>> To avoid underflow, grab and save the mm at the time a dma is mapped. > >>>> Use that mm when adjusting locked_vm, rather than re-acquiring the saved > >>>> task's mm, which may have changed. If the saved mm is dead, do nothing. > >>>> > >>>> locked_vm is incremented for existing mappings in a subsequent patch. > >>>> > >>>> Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation") > >>>> Cc: stable@xxxxxxxxxxxxxxx > >>>> Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx> > >>>> Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx> > >>>> --- > >>>> drivers/vfio/vfio_iommu_type1.c | 27 +++++++++++---------------- > >>>> 1 file changed, 11 insertions(+), 16 deletions(-) > >>>> > >>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > >>>> index 144f5bb..71f980b 100644 > >>>> --- a/drivers/vfio/vfio_iommu_type1.c > >>>> +++ b/drivers/vfio/vfio_iommu_type1.c > >>>> @@ -100,6 +100,7 @@ struct vfio_dma { > >>>> struct task_struct *task; > >>>> struct rb_root pfn_list; /* Ex-user pinned pfn list */ > >>>> unsigned long *bitmap; > >>>> + struct mm_struct *mm; > >>>> }; > >>>> > >>>> struct vfio_batch { > >>>> @@ -420,8 +421,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) > >>>> if (!npage) > >>>> return 0; > >>>> > >>>> - mm = async ? get_task_mm(dma->task) : dma->task->mm; > >>>> - if (!mm) > >>>> + mm = dma->mm; > >>>> + if (async && !mmget_not_zero(mm)) > >>>> return -ESRCH; /* process exited */ > >>> > >>> Just delete the async, the lock_acct always acts on the dma which > >>> always has a singular mm. > >>> > >>> FIx the few callers that need it to do the mmget_no_zero() before > >>> calling in. > >> > >> Most of the callers pass async=true: > >> ret = vfio_lock_acct(dma, lock_acct, false); > >> vfio_lock_acct(dma, locked - unlocked, true); > >> ret = vfio_lock_acct(dma, 1, true); > >> vfio_lock_acct(dma, -unlocked, true); > >> vfio_lock_acct(dma, -1, true); > >> vfio_lock_acct(dma, -unlocked, true); > >> ret = mm_lock_acct(task, mm, lock_cap, npage, false); > >> mm_lock_acct(dma->task, dma->mm, dma->lock_cap, -npage, true); > >> vfio_lock_acct(dma, locked - unlocked, true); > > > > Seems like if you make a lock_sub_acct() function that does the -1* > > and does the mmget it will be OK? > > Do you mean, provide two versions of vfio_lock_acct? Simplified: > > vfio_lock_acct() > { > mm_lock_acct() > dma->locked_vm += npage; > } > > vfio_lock_acct_async() > { > mmget_not_zero(dma->mm) > > mm_lock_acct() > dma->locked_vm += npage; > > mmput(dma->mm); > } I was thinking more like vfio_lock_acct_subtract() mmget_not_zero(dma->mm) mm->locked_vm -= npage Jason