Re: [merged mm-stable] mm-thp-check-and-bail-out-if-page-in-deferred-queue-already.patch removed from -mm tree

Yang Shi <shy828301@xxxxxxxxx> · Wed, 18 Jan 2023 17:31:48 -0800

On Wed, Jan 18, 2023 at 5:15 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
>
> The quilt patch titled
>      Subject: mm/thp: check and bail out if page in deferred queue already
> has been removed from the -mm tree.  Its filename was
>      mm-thp-check-and-bail-out-if-page-in-deferred-queue-already.patch
>
> This patch was dropped because it was merged into the mm-stable branch
> of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>
> ------------------------------------------------------
> From: Yin Fengwei <fengwei.yin@xxxxxxxxx>
> Subject: mm/thp: check and bail out if page in deferred queue already
> Date: Fri, 23 Dec 2022 21:52:07 +0800
>
> Kernel build regression with LLVM was reported here:
> https://lore.kernel.org/all/Y1GCYXGtEVZbcv%2F5@dev-arch.thelio-3990X/ with
> commit f35b5d7d676e ("mm: align larger anonymous mappings on THP
> boundaries").  And the commit f35b5d7d676e was reverted.
>
> It turned out the regression is related with madvise(MADV_DONTNEED)
> was used by ld.lld. But with none PMD_SIZE aligned parameter len.
> trace-bpfcc captured:
> 531607  531732  ld.lld          do_madvise.part.0 start: 0x7feca9000000, len: 0x7fb000, behavior: 0x4
> 531607  531793  ld.lld          do_madvise.part.0 start: 0x7fec86a00000, len: 0x7fb000, behavior: 0x4

This just reminds me that we should reinstantiate Rik's commit?

>
> If the underneath physical page is THP, the madvise(MADV_DONTNEED) can
> trigger split_queue_lock contention raised significantly. perf showed
> following data:
>     14.85%     0.00%  ld.lld           [kernel.kallsyms]           [k]
>        entry_SYSCALL_64_after_hwframe
>            11.52%
>                 entry_SYSCALL_64_after_hwframe
>                 do_syscall_64
>                 __x64_sys_madvise
>                 do_madvise.part.0
>                 zap_page_range
>                 unmap_single_vma
>                 unmap_page_range
>                 page_remove_rmap
>                 deferred_split_huge_page
>                 __lock_text_start
>                 native_queued_spin_lock_slowpath
>
> If THP can't be removed from rmap as whole THP, partial THP will be
> removed from rmap by removing sub-pages from rmap.  Even the THP head page
> is added to deferred queue already, the split_queue_lock will be acquired
> and check whether the THP head page is in the queue already.  Thus, the
> contention of split_queue_lock is raised.
>
> Before acquire split_queue_lock, check and bail out early if the THP
> head page is in the queue already. The checking without holding
> split_queue_lock could race with deferred_split_scan, but it doesn't
> impact the correctness here.
>
> Test result of building kernel with ld.lld:
> commit 7b5a0b664ebe (parent commit of f35b5d7d676e):
> time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all
>         6:07.99 real,   26367.77 user,  5063.35 sys
>
> commit f35b5d7d676e:
> time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all
>         7:22.15 real,   26235.03 user,  12504.55 sys
>
> commit f35b5d7d676e with the fixing patch:
> time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all
>         6:08.49 real,   26520.15 user,  5047.91 sys
>
> Link: https://lkml.kernel.org/r/20221223135207.2275317-1-fengwei.yin@xxxxxxxxx
> Signed-off-by: Yin Fengwei <fengwei.yin@xxxxxxxxx>
> Tested-by: Nathan Chancellor <nathan@xxxxxxxxxx>
> Acked-by: David Rientjes <rientjes@xxxxxxxxxx>
> Reviewed-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
> Cc: Feng Tang <feng.tang@xxxxxxxxx>
> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
> Cc: Rik van Riel <riel@xxxxxxxxxxx>
> Cc: Xing Zhengjun <zhengjun.xing@xxxxxxxxxxxxxxx>
> Cc: Yang Shi <shy828301@xxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> ---
>
>  mm/huge_memory.c |    3 +++
>  1 file changed, 3 insertions(+)
>
> --- a/mm/huge_memory.c~mm-thp-check-and-bail-out-if-page-in-deferred-queue-already
> +++ a/mm/huge_memory.c
> @@ -2835,6 +2835,9 @@ void deferred_split_huge_page(struct pag
>         if (PageSwapCache(page))
>                 return;
>
> +       if (!list_empty(page_deferred_list(page)))
> +               return;
> +
>         spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>         if (list_empty(page_deferred_list(page))) {
>                 count_vm_event(THP_DEFERRED_SPLIT_PAGE);
> _
>
> Patches currently in -mm which might be from fengwei.yin@xxxxxxxxx are
>
>