On 20/01/2024 12:04, Ryan Roberts wrote: > On 14/12/2023 22:34, Yang Shi wrote: >> From: Rik van Riel <riel@xxxxxxxxxxx> >> >> Align larger anonymous memory mappings on THP boundaries by going through >> thp_get_unmapped_area if THPs are enabled for the current process. >> >> With this patch, larger anonymous mappings are now THP aligned. When a >> malloc library allocates a 2MB or larger arena, that arena can now be >> mapped with THPs right from the start, which can result in better TLB hit >> rates and execution time. >> >> Link: https://lkml.kernel.org/r/20220809142457.4751229f@xxxxxxxxxxxxxxxxxxxx >> Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx> >> Reviewed-by: Yang Shi <shy828301@xxxxxxxxx> >> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> >> Cc: Christopher Lameter <cl@xxxxxxxxx> >> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >> --- >> This patch was applied to v6.1, but was reverted due to a regression >> report. However it turned out the regression was not due to this patch. >> I ping'ed Andrew to reapply this patch, Andrew may forget it. This >> patch helps promote THP, so I rebased it onto the latest mm-unstable. > > Hi Yang, > > I'm not sure what regression you are referring to above, but I'm seeing a > performance regression in the virtual_address_range mm selftest on arm64, caused > by this patch (which is now in v6.7). > > I see 2 problems when running the test; 1) it takes much longer to execute, and > 2) the test fails. Both are related: > > The (first part of the) test allocates as many 1GB anonymous blocks as it can in > the low 256TB of address space, passing NULL as the addr hint to mmap. Before > this patch, all allocations were abutted and contained in a single, merged VMA. > However, after this patch, each allocation is in its own VMA, and there is a 2M > gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slower > because there are so many VMAs to check to find a new 1G gap. 2) It fails once > it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then > causes a subsequent calloc() to fail, which causes the test to fail. > > Looking at the code, I think the problem is that arm64 selects > ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() allocates > len+2M then always aligns to the bottom of the discovered gap. That causes the > 2M hole. As far as I can see, x86 allocates bottom up, so you don't get a hole. > > I'm not quite sure what the fix is - perhaps __thp_get_unmapped_area() should be > implemented around vm_unmapped_area(), which can manage the alignment more > intelligently? > > But until/unless someone comes along with a fix, I think this patch should be > reverted. Looks like this patch is also the cause of `ksm_tests -H -s 100` starting to fail on arm64. I haven't looked in detail, but it passes without the change and fails with. So this should definitely be reverted, I think. > > Thanks, > Ryan > > >> >> >> mm/mmap.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/mm/mmap.c b/mm/mmap.c >> index 9d780f415be3..dd25a2aa94f7 100644 >> --- a/mm/mmap.c >> +++ b/mm/mmap.c >> @@ -2232,6 +2232,9 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, >> */ >> pgoff = 0; >> get_area = shmem_get_unmapped_area; >> + } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { >> + /* Ensures that larger anonymous mappings are THP aligned. */ >> + get_area = thp_get_unmapped_area; >> } >> >> addr = get_area(file, addr, len, pgoff, flags); >