On 12/28/2022 4:15 AM, Nathan Chancellor wrote: > On Fri, Dec 23, 2022 at 09:52:07PM +0800, Yin Fengwei wrote: >> Kernel build regression with LLVM was reported here: >> https://lore.kernel.org/all/Y1GCYXGtEVZbcv%2F5@dev-arch.thelio-3990X/ >> with commit f35b5d7d676e ("mm: align larger anonymous mappings on THP >> boundaries"). And the commit f35b5d7d676e was reverted. >> >> It turned out the regression is related with madvise(MADV_DONTNEED) >> was used by ld.lld. But with none PMD_SIZE aligned parameter len. >> trace-bpfcc captured: >> 531607 531732 ld.lld do_madvise.part.0 start: 0x7feca9000000, len: 0x7fb000, behavior: 0x4 >> 531607 531793 ld.lld do_madvise.part.0 start: 0x7fec86a00000, len: 0x7fb000, behavior: 0x4 >> >> If the underneath physical page is THP, the madvise(MADV_DONTNNED) can >> trigger split_queue_lock contention raised significantly. perf showed >> following data: >> 14.85% 0.00% ld.lld [kernel.kallsyms] [k] >> entry_SYSCALL_64_after_hwframe >> 11.52% >> entry_SYSCALL_64_after_hwframe >> do_syscall_64 >> __x64_sys_madvise >> do_madvise.part.0 >> zap_page_range >> unmap_single_vma >> unmap_page_range >> page_remove_rmap >> deferred_split_huge_page >> __lock_text_start >> native_queued_spin_lock_slowpath >> >> If THP can't be removed from rmap as whole THP, partial THP will be >> removed from rmap by removing sub-pages from rmap. Even the THP >> head page is added to deferred queue already, the split_queue_lock >> will be acquired and check whether the THP head page is in the queue >> already. Thus, the contention of split_queue_lock is raised. >> >> Before acquire split_queue_lock, check and bail out early if the THP >> head page is in the queue already. The checking without holding >> split_queue_lock could race with deferred_split_scan, but it doesn't >> impact the correctness here. >> >> Test result of building kernel with ld.lld: >> commit 7b5a0b664ebe (parent commit of f35b5d7d676e): >> time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all >> 6:07.99 real, 26367.77 user, 5063.35 sys >> >> commit f35b5d7d676e: >> time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all >> 7:22.15 real, 26235.03 user, 12504.55 sys >> >> commit f35b5d7d676e with the fixing patch: >> time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all >> 6:08.49 real, 26520.15 user, 5047.91 sys >> >> Signed-off-by: Yin Fengwei <fengwei.yin@xxxxxxxxx> > > I cannot say whether or not this is a good idea or not but it does > resolve the regression I reported: > > Benchmark 1: x86_64 allmodconfig (GCC + ld.lld) @ 1b929c02afd3 ("Linux 6.2-rc1") on 6.0.0-rc3-debug-00016-g7b5a0b664ebe > Time (mean ± σ): 383.003 s ± 0.680 s [User: 34737.850 s, System: 7287.079 s] > Range (min … max): 382.218 s … 383.413 s 3 runs > > Benchmark 1: x86_64 allmodconfig (GCC + ld.lld) @ 1b929c02afd3 ("Linux 6.2-rc1") on 6.0.0-rc3-debug-00017-gf35b5d7d676e > Time (mean ± σ): 437.886 s ± 1.030 s [User: 35888.658 s, System: 14048.871 s] > Range (min … max): 436.865 s … 438.924 s 3 runs > > Benchmark 1: x86_64 allmodconfig (GCC + ld.lld) @ 1b929c02afd3 ("Linux 6.2-rc1") on 6.0.0-rc3-debug-00017-gf35b5d7d676e-dirty > Time (mean ± σ): 384.371 s ± 1.004 s [User: 35402.880 s, System: 6401.691 s] > Range (min … max): 383.547 s … 385.489 s 3 runs > > Tested-by: Nathan Chancellor <nathan@xxxxxxxxxx> Thanks for testing the patch. Regards Yin, Fengwei