On Thu, Oct 24, 2024 at 05:12:29PM +0200, Vlastimil Babka wrote: > Since commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP > boundaries") a mmap() of anonymous memory without a specific address > hint and of at least PMD_SIZE will be aligned to PMD so that it can > benefit from a THP backing page. > > However this change has been shown to regress some workloads > significantly. [1] reports regressions in various spec benchmarks, with > up to 600% slowdown of the cactusBSSN benchmark on some platforms. The Ugh god. > benchmark seems to create many mappings of 4632kB, which would have > merged to a large THP-backed area before commit efa7df3e3bb5 and now > they are fragmented to multiple areas each aligned to PMD boundary with > gaps between. The regression then seems to be caused mainly due to the > benchmark's memory access pattern suffering from TLB or cache aliasing > due to the aligned boundaries of the individual areas. Any more details on precisely why? > > Another known regression bisected to commit efa7df3e3bb5 is darktable > [2] [3] and early testing suggests this patch fixes the regression there > as well. Good! > > To fix the regression but still try to benefit from THP-friendly > anonymous mapping alignment, add a condition that the size of the > mapping must be a multiple of PMD size instead of at least PMD size. In > case of many odd-sized mapping like the cactusBSSN creates, those will > stop being aligned and with gaps between, and instead naturally merge > again. > Seems like the original logic just padded the length by PMD size and checks for overflow, assuming that [pgoff << PAGE_SHIFT, pgoff << PAGE_SHIFT + len) contains at least one PMD-sized block. Which I guess results in potentially getting mis-sized empty spaces that now can't be PMD-merged at the bits that 'overhang' the PMD-sized/aligned bit? Which is yeah, not great and would explain this (correct me if my understanding is wrong). > Reported-by: Michael Matz <matz@xxxxxxx> > Debugged-by: Gabriel Krisman Bertazi <gabriel@xxxxxxxxxx> > Closes: https://bugzilla.suse.com/show_bug.cgi?id=1229012 [1] > Reported-by: Matthias Bodenbinder <matthias@xxxxxxxxxxxxxx> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219366 [2] > Closes: https://lore.kernel.org/all/2050f0d4-57b0-481d-bab8-05e8d48fed0c@xxxxxxxxxxxxx/ [3] > Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") > Cc: <stable@xxxxxxxxxxxxxxx> > Cc: Rik van Riel <riel@xxxxxxxxxxx> > Cc: Yang Shi <yang@xxxxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx> > --- > mm/mmap.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index 9c0fb43064b5..a5297cfb1dfc 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -900,7 +900,8 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, > > if (get_area) { > addr = get_area(file, addr, len, pgoff, flags); > - } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { > + } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) > + && IS_ALIGNED(len, PMD_SIZE)) { So doing this feels right but... Hm this seems like it belongs in __thp_get_unmapped_area() which does a bunch of checks up front returning 0 if they fail, which then results in it peforming the normal get unmapped area logic. That also has a bunch of (offset) alignment checks as well overflow checks so it would seem the natural place to also check length? > /* Ensures that larger anonymous mappings are THP aligned. */ > addr = thp_get_unmapped_area_vmflags(file, addr, len, > pgoff, flags, vm_flags); > -- > 2.47.0 >