On 1/9/2023 10:54 AM, Steven Sistare wrote: > On 1/9/2023 8:52 AM, Jason Gunthorpe wrote: >> On Fri, Jan 06, 2023 at 10:14:57AM -0500, Steven Sistare wrote: >>> On 1/3/2023 2:20 PM, Jason Gunthorpe wrote: >>>> On Tue, Jan 03, 2023 at 01:12:53PM -0500, Steven Sistare wrote: >>>>> On 1/3/2023 10:20 AM, Jason Gunthorpe wrote: >>>>>> On Tue, Dec 20, 2022 at 12:39:20PM -0800, Steve Sistare wrote: >>>>>>> When a vfio container is preserved across exec, the task does not change, >>>>>>> but it gets a new mm with locked_vm=0, and loses the count from existing >>>>>>> dma mappings. If the user later unmaps a dma mapping, locked_vm underflows >>>>>>> to a large unsigned value, and a subsequent dma map request fails with >>>>>>> ENOMEM in __account_locked_vm. >>>>>>> >>>>>>> To avoid underflow, grab and save the mm at the time a dma is mapped. >>>>>>> Use that mm when adjusting locked_vm, rather than re-acquiring the saved >>>>>>> task's mm, which may have changed. If the saved mm is dead, do nothing. >>>>>>> >>>>>>> locked_vm is incremented for existing mappings in a subsequent patch. >>>>>>> >>>>>>> Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation") >>>>>>> Cc: stable@xxxxxxxxxxxxxxx >>>>>>> Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx> >>>>>>> Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx> >>>>>>> --- >>>>>>> drivers/vfio/vfio_iommu_type1.c | 27 +++++++++++---------------- >>>>>>> 1 file changed, 11 insertions(+), 16 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >>>>>>> index 144f5bb..71f980b 100644 >>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c >>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c >>>>>>> @@ -100,6 +100,7 @@ struct vfio_dma { >>>>>>> struct task_struct *task; >>>>>>> struct rb_root pfn_list; /* Ex-user pinned pfn list */ >>>>>>> unsigned long *bitmap; >>>>>>> + struct mm_struct *mm; >>>>>>> }; >>>>>>> >>>>>>> struct vfio_batch { >>>>>>> @@ -420,8 +421,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) >>>>>>> if (!npage) >>>>>>> return 0; >>>>>>> >>>>>>> - mm = async ? get_task_mm(dma->task) : dma->task->mm; >>>>>>> - if (!mm) >>>>>>> + mm = dma->mm; >>>>>>> + if (async && !mmget_not_zero(mm)) >>>>>>> return -ESRCH; /* process exited */ >>>>>> >>>>>> Just delete the async, the lock_acct always acts on the dma which >>>>>> always has a singular mm. >>>>>> >>>>>> FIx the few callers that need it to do the mmget_no_zero() before >>>>>> calling in. >>>>> >>>>> Most of the callers pass async=true: >>>>> ret = vfio_lock_acct(dma, lock_acct, false); >>>>> vfio_lock_acct(dma, locked - unlocked, true); >>>>> ret = vfio_lock_acct(dma, 1, true); >>>>> vfio_lock_acct(dma, -unlocked, true); >>>>> vfio_lock_acct(dma, -1, true); >>>>> vfio_lock_acct(dma, -unlocked, true); >>>>> ret = mm_lock_acct(task, mm, lock_cap, npage, false); >>>>> mm_lock_acct(dma->task, dma->mm, dma->lock_cap, -npage, true); >>>>> vfio_lock_acct(dma, locked - unlocked, true); >>>> >>>> Seems like if you make a lock_sub_acct() function that does the -1* >>>> and does the mmget it will be OK? >>> >>> Do you mean, provide two versions of vfio_lock_acct? Simplified: >>> >>> vfio_lock_acct() >>> { >>> mm_lock_acct() >>> dma->locked_vm += npage; >>> } >>> >>> vfio_lock_acct_async() >>> { >>> mmget_not_zero(dma->mm) >>> >>> mm_lock_acct() >>> dma->locked_vm += npage; >>> >>> mmput(dma->mm); >>> } >> >> I was thinking more like >> >> () >> mmget_not_zero(dma->mm) >> mm->locked_vm -= npage > ^^^^^^ > Is this shorthand for open coding __account_locked_vm? If so, we are > essentially saying the same thing. My function vfio_lock_acct_async calls > mm_lock_acct which calls __account_locked_vm. > > But, your vfio_lock_acct_subtract does not call mmput, so maybe I still don't > grok your suggestion. > > FWIW here are my functions with all error checking: > > static int mm_lock_acct(struct task_struct *task, struct mm_struct *mm, > bool lock_cap, long npage) > { > int ret = mmap_write_lock_killable(mm); > > if (!ret) { > ret = __account_locked_vm(mm, abs(npage), npage > 0, task, > lock_cap); > mmap_write_unlock(mm); > } > > return ret; > } > > static int vfio_lock_acct(struct vfio_dma *dma, long npage) > { > int ret; > > if (!npage) > return 0; > > ret = mm_lock_acct(dma->task, dma->mm, dma->lock_cap, npage); > if (!ret) > dma->locked_vm += npage; > > return ret; > } > > static int vfio_lock_acct_async(struct vfio_dma *dma, long npage) > { > int ret; > > if (!npage) > return 0; > > if (!mmget_not_zero(dma->mm)) > return -ESRCH; /* process exited */ > > ret = mm_lock_acct(dma->task, dma->mm, dma->lock_cap, npage); > if (!ret) > dma->locked_vm += npage; > > mmput(dma->mm); > > return ret; > } Let's leave the async arg and vfio_lock_acct as is. We are over-polishing a small style issue, in pre-existing code, in a soon-to-be dead-end code base. - Steve