On Wed, Jan 18, 2023 at 5:15 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > The quilt patch titled > Subject: mm/thp: check and bail out if page in deferred queue already > has been removed from the -mm tree. Its filename was > mm-thp-check-and-bail-out-if-page-in-deferred-queue-already.patch > > This patch was dropped because it was merged into the mm-stable branch > of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > > ------------------------------------------------------ > From: Yin Fengwei <fengwei.yin@xxxxxxxxx> > Subject: mm/thp: check and bail out if page in deferred queue already > Date: Fri, 23 Dec 2022 21:52:07 +0800 > > Kernel build regression with LLVM was reported here: > https://lore.kernel.org/all/Y1GCYXGtEVZbcv%2F5@dev-arch.thelio-3990X/ with > commit f35b5d7d676e ("mm: align larger anonymous mappings on THP > boundaries"). And the commit f35b5d7d676e was reverted. > > It turned out the regression is related with madvise(MADV_DONTNEED) > was used by ld.lld. But with none PMD_SIZE aligned parameter len. > trace-bpfcc captured: > 531607 531732 ld.lld do_madvise.part.0 start: 0x7feca9000000, len: 0x7fb000, behavior: 0x4 > 531607 531793 ld.lld do_madvise.part.0 start: 0x7fec86a00000, len: 0x7fb000, behavior: 0x4 This just reminds me that we should reinstantiate Rik's commit? > > If the underneath physical page is THP, the madvise(MADV_DONTNEED) can > trigger split_queue_lock contention raised significantly. perf showed > following data: > 14.85% 0.00% ld.lld [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > 11.52% > entry_SYSCALL_64_after_hwframe > do_syscall_64 > __x64_sys_madvise > do_madvise.part.0 > zap_page_range > unmap_single_vma > unmap_page_range > page_remove_rmap > deferred_split_huge_page > __lock_text_start > native_queued_spin_lock_slowpath > > If THP can't be removed from rmap as whole THP, partial THP will be > removed from rmap by removing sub-pages from rmap. Even the THP head page > is added to deferred queue already, the split_queue_lock will be acquired > and check whether the THP head page is in the queue already. Thus, the > contention of split_queue_lock is raised. > > Before acquire split_queue_lock, check and bail out early if the THP > head page is in the queue already. The checking without holding > split_queue_lock could race with deferred_split_scan, but it doesn't > impact the correctness here. > > Test result of building kernel with ld.lld: > commit 7b5a0b664ebe (parent commit of f35b5d7d676e): > time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all > 6:07.99 real, 26367.77 user, 5063.35 sys > > commit f35b5d7d676e: > time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all > 7:22.15 real, 26235.03 user, 12504.55 sys > > commit f35b5d7d676e with the fixing patch: > time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all > 6:08.49 real, 26520.15 user, 5047.91 sys > > Link: https://lkml.kernel.org/r/20221223135207.2275317-1-fengwei.yin@xxxxxxxxx > Signed-off-by: Yin Fengwei <fengwei.yin@xxxxxxxxx> > Tested-by: Nathan Chancellor <nathan@xxxxxxxxxx> > Acked-by: David Rientjes <rientjes@xxxxxxxxxx> > Reviewed-by: "Huang, Ying" <ying.huang@xxxxxxxxx> > Cc: Feng Tang <feng.tang@xxxxxxxxx> > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > Cc: Rik van Riel <riel@xxxxxxxxxxx> > Cc: Xing Zhengjun <zhengjun.xing@xxxxxxxxxxxxxxx> > Cc: Yang Shi <shy828301@xxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > --- > > mm/huge_memory.c | 3 +++ > 1 file changed, 3 insertions(+) > > --- a/mm/huge_memory.c~mm-thp-check-and-bail-out-if-page-in-deferred-queue-already > +++ a/mm/huge_memory.c > @@ -2835,6 +2835,9 @@ void deferred_split_huge_page(struct pag > if (PageSwapCache(page)) > return; > > + if (!list_empty(page_deferred_list(page))) > + return; > + > spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > if (list_empty(page_deferred_list(page))) { > count_vm_event(THP_DEFERRED_SPLIT_PAGE); > _ > > Patches currently in -mm which might be from fengwei.yin@xxxxxxxxx are > >