On Wed, Jan 05, 2022 at 08:53:39PM -0400, Jason Gunthorpe wrote: > On Wed, Jan 05, 2022 at 07:46:48PM -0500, Daniel Jordan wrote: > > padata threads hold mmap_lock as reader for the majority of their > > runtime in order to call pin_user_pages_remote(), but they also > > periodically take mmap_lock as writer for short periods to adjust > > mm->locked_vm, hurting parallelism. > > > > Alleviate the write-side contention with a per-thread cache of locked_vm > > which allows taking mmap_lock as writer far less frequently. > > > > Failure to refill the cache due to insufficient locked_vm will not cause > > the entire pinning operation to error out. This avoids spurious failure > > in case some pinned pages aren't accounted to locked_vm. > > > > Cache size is limited to provide some protection in the unlikely event > > of a concurrent locked_vm accounting operation in the same address space > > needlessly failing in case the cache takes more locked_vm than it needs. > > Why not just do the pinned page accounting once at the start? Why does > it have to be done incrementally? Yeah, good question. I tried doing it that way recently and it did improve performance a bit, but I thought it wasn't enough of a gain to justify how it overaccounted by the size of the entire pin. If the concurrent accounting I worried about above isn't really a concern, though, I can reconsider this.