On 4/10/2023 10:11 PM, Matthew Wilcox wrote: > On Mon, Apr 10, 2023 at 04:53:19AM +0000, Yin, Fengwei wrote: >> On Tue, 2023-04-04 at 16:23 +0100, Matthew Wilcox wrote: >>> On Tue, Apr 04, 2023 at 02:58:50PM +0100, Matthew Wilcox (Oracle) >>> wrote: >>>> The map_pages fs method should be safe to run under the VMA lock >>>> instead >>>> of the mmap lock. This should have a measurable reduction in >>>> contention >>>> on the mmap lock. >>> >>> https://github.com/antonblanchard/will-it-scale/pull/37/files should >>> be a good microbenchmark to report numbers from. Obviously real- >>> world >>> benchmarks will be more compelling. >>> >> >> Test result in my side with page_fault4 of will-it-scale in thread >> mode is: >> 15274196 (without the patch) -> 17291444 (with the patch) >> >> 13.2% improvement on a Ice Lake with 48C/96T + 192G RAM + ext4 >> filesystem. > > Thanks! That is really good news. > >> The perf showed the mmap_lock contention reduced a lot: >> (Removed the grandson functions of do_user_addr_fault()) >> >> latest linux-next with the patch: >> 51.78%--do_user_addr_fault >> | >> |--49.09%--handle_mm_fault >> |--1.19%--lock_vma_under_rcu >> --1.09%--down_read >> >> latest linux-next without the patch: >> 73.65%--do_user_addr_fault >> | >> |--28.65%--handle_mm_fault >> |--17.22%--down_read_trylock >> |--10.92%--down_read >> |--9.20%--up_read >> --7.30%--find_vma >> >> My understanding is down_read_trylock, down_read and up_read all are >> related with mmap_lock. So the mmap_lock contention reduction is quite >> obvious. > > Absolutely. I'm a little surprised that find_vma() basically disappeared > from the perf results. I guess that it was cache cold after contending > on the mmap_lock rwsem. But this is a very encouraging result. Yes. find_vma() was surprise. So I did more check about it. 1. re-run the testing for more 3 times in case I made stupid mistake. All testing show same behavior for find_vma(). 2. perf report for find_vma() shows: 6.26%--find_vma | --0.66%--mt_find | --0.60%--mtree_range_walk The most time used in find_vma() is not mt_find. It's mmap_assert_locked(mm). 3. perf annotate of find_vma shows: │ ffffffffa91e9f20 <load0>: 0.07 │ nop 0.07 │ sub $0x10,%rsp 0.01 │ mov %gs:0x28,%rax 0.05 │ mov %rax,0x8(%rsp) 0.02 │ xor %eax,%eax │ mov %rsi,(%rsp) 0.00 │ mov 0x70(%rdi),%rax --> This is rwsem_is_locked(&mm->mmap_lock) 99.60 │ test %rax,%rax │ ↓ je 4e 0.09 │ mov $0xffffffffffffffff,%rdx │ mov %rsp,%rsi I believe the line "mov 0x70(%rdi),%rax" should occupy 99.60% runtime of find_vma() instead of line "test %rax,%rax". And it's accessing the mmap_lock. With this series, the mmap_lock contention is reduced a lot with page_fault4 of will-it-scale. It makes mmap_lock access fast also. Regards Yin, Fengwei