On Mon, Feb 11, 2019 at 03:56:20PM -0700, Jason Gunthorpe wrote: > On Mon, Feb 11, 2019 at 05:44:33PM -0500, Daniel Jordan wrote: > > @@ -266,24 +267,15 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) > > if (!mm) > > return -ESRCH; /* process exited */ > > > > - ret = down_write_killable(&mm->mmap_sem); > > - if (!ret) { > > - if (npage > 0) { > > - if (!dma->lock_cap) { > > - unsigned long limit; > > - > > - limit = task_rlimit(dma->task, > > - RLIMIT_MEMLOCK) >> PAGE_SHIFT; > > + pinned_vm = atomic64_add_return(npage, &mm->pinned_vm); > > > > - if (mm->locked_vm + npage > limit) > > - ret = -ENOMEM; > > - } > > + if (npage > 0 && !dma->lock_cap) { > > + unsigned long limit = task_rlimit(dma->task, RLIMIT_MEMLOCK) >> > > + > > - PAGE_SHIFT; > > I haven't looked at this super closely, but how does this stuff work? > > do_mlock doesn't touch pinned_vm, and this doesn't touch locked_vm... > > Shouldn't all this be 'if (locked_vm + pinned_vm < RLIMIT_MEMLOCK)' ? > > Otherwise MEMLOCK is really doubled.. So this has been a problem for some time, but it's not as easy as adding them together, see [1][2] for a start. The locked_vm/pinned_vm issue definitely needs fixing, but all this series is trying to do is account to the right counter. Daniel [1] http://lkml.kernel.org/r/20130523104154.GA23650@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx [2] http://lkml.kernel.org/r/20130524140114.GK23650@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx