The patch titled Subject: mm: brk: downgrade mmap_sem to read when shrinking has been added to the -mm tree. Its filename is mm-brk-downgrade-mmap_sem-to-read-when-shrinking.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-brk-downgrade-mmap_sem-to-read-when-shrinking.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-brk-downgrade-mmap_sem-to-read-when-shrinking.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> Subject: mm: brk: downgrade mmap_sem to read when shrinking brk might be used to shrink memory mapping too other than munmap(). So, it may hold write mmap_sem for long time when shrinking large mapping, as what commit ("mm: mmap: zap pages with read mmap_sem in munmap") described. The brk() will not manipulate vmas anymore after __do_munmap() call for the mapping shrink use case. But, it may set mm->brk after __do_munmap(), which needs hold write mmap_sem. However, a simple trick can workaround this by setting mm->brk before __do_munmap(). Then restore the original value if __do_munmap() fails. With this trick, it is safe to downgrade to read mmap_sem. So, the same optimization, which downgrades mmap_sem to read for zapping pages, is also feasible and reasonable to this case. The period of holding exclusive mmap_sem for shrinking large mapping would be reduced significantly with this optimization. Link: http://lkml.kernel.org/r/1538067582-60038-2-git-send-email-yang.shi@xxxxxxxxxxxxxxxxx Signed-off-by: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> Acked-by: Vlastimil Babka <vbabka@xxxxxxx> Acked-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxxxx> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> Cc: Laurent Dufour <ldufour@xxxxxxxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/mmap.c | 43 ++++++++++++++++++++++++++++++++----------- 1 file changed, 32 insertions(+), 11 deletions(-) --- a/mm/mmap.c~mm-brk-downgrade-mmap_sem-to-read-when-shrinking +++ a/mm/mmap.c @@ -191,16 +191,19 @@ static int do_brk_flags(unsigned long ad SYSCALL_DEFINE1(brk, unsigned long, brk) { unsigned long retval; - unsigned long newbrk, oldbrk; + unsigned long newbrk, oldbrk, origbrk; struct mm_struct *mm = current->mm; struct vm_area_struct *next; unsigned long min_brk; bool populate; + bool downgraded = false; LIST_HEAD(uf); if (down_write_killable(&mm->mmap_sem)) return -EINTR; + origbrk = mm->brk; + #ifdef CONFIG_COMPAT_BRK /* * CONFIG_COMPAT_BRK can still be overridden by setting @@ -229,14 +232,29 @@ SYSCALL_DEFINE1(brk, unsigned long, brk) newbrk = PAGE_ALIGN(brk); oldbrk = PAGE_ALIGN(mm->brk); - if (oldbrk == newbrk) - goto set_brk; + if (oldbrk == newbrk) { + mm->brk = brk; + goto success; + } - /* Always allow shrinking brk. */ + /* + * Always allow shrinking brk. + * __do_munmap() may downgrade mmap_sem to read. + */ if (brk <= mm->brk) { - if (!do_munmap(mm, newbrk, oldbrk-newbrk, &uf)) - goto set_brk; - goto out; + /* + * mm->brk need to be protected by write mmap_sem, update it + * before downgrading mmap_sem. + * When __do_munmap fail, it will be restored from origbrk. + */ + mm->brk = brk; + retval = __do_munmap(mm, newbrk, oldbrk-newbrk, &uf, true); + if (retval < 0) { + mm->brk = origbrk; + goto out; + } else if (retval == 1) + downgraded = true; + goto success; } /* Check against existing mmap mappings. */ @@ -247,18 +265,21 @@ SYSCALL_DEFINE1(brk, unsigned long, brk) /* Ok, looks good - let it rip. */ if (do_brk_flags(oldbrk, newbrk-oldbrk, 0, &uf) < 0) goto out; - -set_brk: mm->brk = brk; + +success: populate = newbrk > oldbrk && (mm->def_flags & VM_LOCKED) != 0; - up_write(&mm->mmap_sem); + if (downgraded) + up_read(&mm->mmap_sem); + else + up_write(&mm->mmap_sem); userfaultfd_unmap_complete(mm, &uf); if (populate) mm_populate(oldbrk, newbrk - oldbrk); return brk; out: - retval = mm->brk; + retval = origbrk; up_write(&mm->mmap_sem); return retval; } _ Patches currently in -mm which might be from yang.shi@xxxxxxxxxxxxxxxxx are mm-mmap-zap-pages-with-read-mmap_sem-in-munmap.patch mm-unmap-vm_hugetlb-mappings-with-optimized-path.patch mm-unmap-vm_pfnmap-mappings-with-optimized-path.patch mm-mremap-downgrade-mmap_sem-to-read-when-shrinking.patch mm-brk-downgrade-mmap_sem-to-read-when-shrinking.patch