On Mon, 29 Jan 2018, Laurent Dufour wrote:
Hi, I would like to talk about the way to remove the mmap_sem contention we could see on large threaded systems.
'cause what's lsfmm without mmap_sem, right? ;)
I already resurrected the Speculative Page Fault patchset from Peter Zijlstra [1]. This series allows concurrency between page fault handler and the other thread's activity. Running a massively threaded benchmark like ebizzy [2] on top of this kernel shows that there is an opportunity to scale far better on large systems (x2). But the SPF series is addressing only one part of the issue, and there is a need to address the other part of picture. There have been some discussions last year about the range locking but this has been put in hold, especially because this implies huge change in the kernel as the mmap_sem is used to protect so many resources (should we need to protect the process command line with the mmap_sem ?), and sometimes the assumption is made that the mmap_sem is protecting code against concurrency while it is not dealing clearly with the mmap_sem. This will be a massive change and rebasing such a series will be hard, so it may be far better to first agreed on best options to improve mmap_sem's performance and scalability. There are several additional options on the table, including range locking, multiple fine-grained locks, etc... In addition, I would like to discuss the options and the best way to make the move smooth in breaking or replacing the mmap_sem.
I'd also like to discuss this stuff. In particular I've been focusing on range locking the mm. With the range_lock primitive ready by now (including rbtree optimizations), my priority as been converting mmap_sem and getting adequate performance data for the worst case scenario (full range). Also, fyi recently by means of auditing handle_mm_fault() and gup family, two new naughty users were found that were doing gup() without mmap_sem: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?h=wip/dl-for-next&id=487f6683f1b738e40aca2386b9f73da4ebb8223d https://lkml.org/lkml/2018/1/22/640 With a from-scratch conversion, it's been mostly pretty straightforward although I've done some hacks along the way. In particular avoiding having to teach file_operations about mmrange, which would be a gazillion times more changes than what we already have. So removing the is_locked() check for calls like zap_pmd_range(), thp (pmd_trans_huge_lock()), vm_insert_page() (which I audited and all ->fault() users seem to correctly set VM_MIXEDMAP, so we might be able to get rid of it, dunno). All this said, yes, I hope to have the patches and numbers asap (way before lsfmm).
Peoples (sorry if I missed someone) : Andrea Arcangeli Davidlohr Bueso Michal Hocko Anshuman Khandual Andi Kleen Andrew Morton Matthew Wilcox Peter Zijlstra
I'd also add Kirill A. Shutemov. Thanks, Davidlohr -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>