On Thu, 6 Jan 2022 08:34:56 -0400 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Wed, Jan 05, 2022 at 08:17:08PM -0500, Daniel Jordan wrote: > > On Wed, Jan 05, 2022 at 08:53:39PM -0400, Jason Gunthorpe wrote: > > > On Wed, Jan 05, 2022 at 07:46:48PM -0500, Daniel Jordan wrote: > > > > padata threads hold mmap_lock as reader for the majority of their > > > > runtime in order to call pin_user_pages_remote(), but they also > > > > periodically take mmap_lock as writer for short periods to adjust > > > > mm->locked_vm, hurting parallelism. > > > > > > > > Alleviate the write-side contention with a per-thread cache of locked_vm > > > > which allows taking mmap_lock as writer far less frequently. > > > > > > > > Failure to refill the cache due to insufficient locked_vm will not cause > > > > the entire pinning operation to error out. This avoids spurious failure > > > > in case some pinned pages aren't accounted to locked_vm. > > > > > > > > Cache size is limited to provide some protection in the unlikely event > > > > of a concurrent locked_vm accounting operation in the same address space > > > > needlessly failing in case the cache takes more locked_vm than it needs. > > > > > > Why not just do the pinned page accounting once at the start? Why does > > > it have to be done incrementally? > > > > Yeah, good question. I tried doing it that way recently and it did > > improve performance a bit, but I thought it wasn't enough of a gain to > > justify how it overaccounted by the size of the entire pin. > > Why would it over account? We'd be guessing that the entire virtual address mapping counts against locked memory limits, but it might include PFNMAP pages or pages that are already account via the page pinning interface that mdev devices use. At that point we're risking that the user isn't concurrently doing something else that could fail as a result of pre-accounting and fixup later schemes like this. Thanks, Alex