On 1/9/2023 8:52 AM, Jason Gunthorpe wrote: > On Fri, Jan 06, 2023 at 10:14:57AM -0500, Steven Sistare wrote: >> On 1/3/2023 2:20 PM, Jason Gunthorpe wrote: >>> On Tue, Jan 03, 2023 at 01:12:53PM -0500, Steven Sistare wrote: >>>> On 1/3/2023 10:20 AM, Jason Gunthorpe wrote: >>>>> On Tue, Dec 20, 2022 at 12:39:20PM -0800, Steve Sistare wrote: >>>>>> When a vfio container is preserved across exec, the task does not change, >>>>>> but it gets a new mm with locked_vm=0, and loses the count from existing >>>>>> dma mappings. If the user later unmaps a dma mapping, locked_vm underflows >>>>>> to a large unsigned value, and a subsequent dma map request fails with >>>>>> ENOMEM in __account_locked_vm. >>>>>> >>>>>> To avoid underflow, grab and save the mm at the time a dma is mapped. >>>>>> Use that mm when adjusting locked_vm, rather than re-acquiring the saved >>>>>> task's mm, which may have changed. If the saved mm is dead, do nothing. >>>>>> >>>>>> locked_vm is incremented for existing mappings in a subsequent patch. >>>>>> >>>>>> Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation") >>>>>> Cc: stable@xxxxxxxxxxxxxxx >>>>>> Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx> >>>>>> Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx> >>>>>> --- >>>>>> drivers/vfio/vfio_iommu_type1.c | 27 +++++++++++---------------- >>>>>> 1 file changed, 11 insertions(+), 16 deletions(-) >>>>>> >>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >>>>>> index 144f5bb..71f980b 100644 >>>>>> --- a/drivers/vfio/vfio_iommu_type1.c >>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c >>>>>> @@ -100,6 +100,7 @@ struct vfio_dma { >>>>>> struct task_struct *task; >>>>>> struct rb_root pfn_list; /* Ex-user pinned pfn list */ >>>>>> unsigned long *bitmap; >>>>>> + struct mm_struct *mm; >>>>>> }; >>>>>> >>>>>> struct vfio_batch { >>>>>> @@ -420,8 +421,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) >>>>>> if (!npage) >>>>>> return 0; >>>>>> >>>>>> - mm = async ? get_task_mm(dma->task) : dma->task->mm; >>>>>> - if (!mm) >>>>>> + mm = dma->mm; >>>>>> + if (async && !mmget_not_zero(mm)) >>>>>> return -ESRCH; /* process exited */ >>>>> >>>>> Just delete the async, the lock_acct always acts on the dma which >>>>> always has a singular mm. >>>>> >>>>> FIx the few callers that need it to do the mmget_no_zero() before >>>>> calling in. >>>> >>>> Most of the callers pass async=true: >>>> ret = vfio_lock_acct(dma, lock_acct, false); >>>> vfio_lock_acct(dma, locked - unlocked, true); >>>> ret = vfio_lock_acct(dma, 1, true); >>>> vfio_lock_acct(dma, -unlocked, true); >>>> vfio_lock_acct(dma, -1, true); >>>> vfio_lock_acct(dma, -unlocked, true); >>>> ret = mm_lock_acct(task, mm, lock_cap, npage, false); >>>> mm_lock_acct(dma->task, dma->mm, dma->lock_cap, -npage, true); >>>> vfio_lock_acct(dma, locked - unlocked, true); >>> >>> Seems like if you make a lock_sub_acct() function that does the -1* >>> and does the mmget it will be OK? >> >> Do you mean, provide two versions of vfio_lock_acct? Simplified: >> >> vfio_lock_acct() >> { >> mm_lock_acct() >> dma->locked_vm += npage; >> } >> >> vfio_lock_acct_async() >> { >> mmget_not_zero(dma->mm) >> >> mm_lock_acct() >> dma->locked_vm += npage; >> >> mmput(dma->mm); >> } > > I was thinking more like > > () > mmget_not_zero(dma->mm) > mm->locked_vm -= npage ^^^^^^ Is this shorthand for open coding __account_locked_vm? If so, we are essentially saying the same thing. My function vfio_lock_acct_async calls mm_lock_acct which calls __account_locked_vm. But, your vfio_lock_acct_subtract does not call mmput, so maybe I still don't grok your suggestion. FWIW here are my functions with all error checking: static int mm_lock_acct(struct task_struct *task, struct mm_struct *mm, bool lock_cap, long npage) { int ret = mmap_write_lock_killable(mm); if (!ret) { ret = __account_locked_vm(mm, abs(npage), npage > 0, task, lock_cap); mmap_write_unlock(mm); } return ret; } static int vfio_lock_acct(struct vfio_dma *dma, long npage) { int ret; if (!npage) return 0; ret = mm_lock_acct(dma->task, dma->mm, dma->lock_cap, npage); if (!ret) dma->locked_vm += npage; return ret; } static int vfio_lock_acct_async(struct vfio_dma *dma, long npage) { int ret; if (!npage) return 0; if (!mmget_not_zero(dma->mm)) return -ESRCH; /* process exited */ ret = mm_lock_acct(dma->task, dma->mm, dma->lock_cap, npage); if (!ret) dma->locked_vm += npage; mmput(dma->mm); return ret; } - Steve