The patch titled Subject: hugetlbfs: take read_lock on i_mmap for PMD sharing has been added to the -mm tree. Its filename is hugetlbfs-take-read_lock-on-i_mmap-for-pmd-sharing.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/hugetlbfs-take-read_lock-on-i_mmap-for-pmd-sharing.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/hugetlbfs-take-read_lock-on-i_mmap-for-pmd-sharing.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Waiman Long <longman@xxxxxxxxxx> Subject: hugetlbfs: take read_lock on i_mmap for PMD sharing A customer with large SMP systems (up to 16 sockets) with application that uses large amount of static hugepages (~500-1500GB) are experiencing random multisecond delays. These delays were caused by the long time it took to scan the VMA interval tree with mmap_sem held. The sharing of huge PMD does not require changes to the i_mmap at all. Therefore, we can just take the read lock and let other threads searching for the right VMA share it in parallel. Once the right VMA is found, either the PMD lock (2M huge page for x86-64) or the mm->page_table_lock will be acquired to perform the actual PMD sharing. Lock contention, if present, will happen in the spinlock. That is much better than contention in the rwsem where the time needed to scan the the interval tree is indeterminate. With this patch applied, the customer is seeing significant performance improvement over the unpatched kernel. Link: http://lkml.kernel.org/r/20191107211809.9539-1-longman@xxxxxxxxxx Signed-off-by: Waiman Long <longman@xxxxxxxxxx> Suggested-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> Cc: Davidlohr Bueso <dave@xxxxxxxxxxxx> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: Will Deacon <will.deacon@xxxxxxx> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/hugetlb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/hugetlb.c~hugetlbfs-take-read_lock-on-i_mmap-for-pmd-sharing +++ a/mm/hugetlb.c @@ -4769,7 +4769,7 @@ pte_t *huge_pmd_share(struct mm_struct * if (!vma_shareable(vma, addr)) return (pte_t *)pmd_alloc(mm, pud, addr); - i_mmap_lock_write(mapping); + i_mmap_lock_read(mapping); vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { if (svma == vma) continue; @@ -4799,7 +4799,7 @@ pte_t *huge_pmd_share(struct mm_struct * spin_unlock(ptl); out: pte = (pte_t *)pmd_alloc(mm, pud, addr); - i_mmap_unlock_write(mapping); + i_mmap_unlock_read(mapping); return pte; } _ Patches currently in -mm which might be from longman@xxxxxxxxxx are hugetlbfs-take-read_lock-on-i_mmap-for-pmd-sharing.patch