[PATCH rdma-next] RDMA/core: Acquire and release mmap_sem on page range

Leon Romanovsky <leon@xxxxxxxxxx> · Tue, 25 Sep 2018 12:04:04 +0300

From: Parav Pandit <parav@xxxxxxxxxxxx>

Currently mmap_sem is read locked while pinning the memory.
In a multi-threaded application of a process, holding mmap_sem lock
creates contention with other threads who might be either registering
memory, creating QPs or simply doing mmap() as such operations also
require to hold the mmap_sem write lock.

All such operation cannot make forward progress until one memory pin
operation is completed.
It becomes more worse if the memory is unpinned and/or memory
registration is large (in GB range).

Therefore, instead of holding mmap_sem for too long (for whole region
pinning), acquire and release the lock for every few pages.
For example on x86 with 4K page size, acquire and release mmap_sem for
every 2Mbytes memory chunk.

This allows other competing threads to make progress who might wish
to hold mmap_sem for shorter duration.

When memory registration latency is measured using [1] for memory sizes
ranging from 4K to 48GB, <= 1% or 0.5% degradation is noticed. In many
runs no difference is seen other than run-to-run variance.

In other targeted tests of users with large memory, desired improvements
are seen due to reduced contention of mmap_sem.

[1] https://github.com/paravmellanox/rtool

./rdma_resource_lat -c 1 -s 48G -a -u L -i 500 -A
It registers pinned memory from 4K to 48GB size with 500 iterations for
each memory size.

./rdma_resource_lat -c 1 -s 12G -a -u L -i 500 -t 4
4 competing threads pinns memory, each of 12GB size with 500 iterations.

Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx>
Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
---
 drivers/infiniband/core/umem.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index fec5d489e311..c114f22d3ef9 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -180,8 +180,8 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 
 	sg_list_start = umem->sg_head.sgl;
 
-	down_read(&mm->mmap_sem);
 	while (npages) {
+		down_read(&mm->mmap_sem);
 		ret = get_user_pages_longterm(cur_base,
 				     min_t(unsigned long, npages,
 					   PAGE_SIZE / sizeof (struct page *)),
@@ -195,17 +195,20 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 		cur_base += ret * PAGE_SIZE;
 		npages   -= ret;
 
+		/* Continue to hold the mmap_sem as vma_list access
+		 * needs to be protected.
+		 */
 		for_each_sg(sg_list_start, sg, ret, i) {
 			if (vma_list && !is_vm_hugetlb_page(vma_list[i]))
 				umem->hugetlb = 0;
 
 			sg_set_page(sg, page_list[i], PAGE_SIZE, 0);
 		}
+		up_read(&mm->mmap_sem);
 
 		/* preparing for next loop */
 		sg_list_start = sg;
 	}
-	up_read(&mm->mmap_sem);
 
 	umem->nmap = ib_dma_map_sg_attrs(context->device,
 				  umem->sg_head.sgl,
-- 
2.14.4