Re: [PATCH] mm: Respect mmap hint address when aligning for THP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 18, 2024 at 1:44 PM Yang Shi <shy828301@xxxxxxxxx> wrote:
>
> On Mon, Nov 18, 2024 at 9:52 AM Kalesh Singh <kaleshsingh@xxxxxxxxxx> wrote:
> >
> > On Mon, Nov 18, 2024 at 9:05 AM Yang Shi <shy828301@xxxxxxxxx> wrote:
> > >
> > > On Sun, Nov 17, 2024 at 3:12 AM Vlastimil Babka <vbabka@xxxxxxx> wrote:
> > > >
> > > > On 11/15/24 23:15, Yang Shi wrote:
> > > > > On Fri, Nov 15, 2024 at 1:52 PM Kalesh Singh <kaleshsingh@xxxxxxxxxx> wrote:
> > > > >>
> > > > >> Commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> > > > >> boundaries") updated __get_unmapped_area() to align the start address
> > > > >> for the VMA to a PMD boundary if CONFIG_TRANSPARENT_HUGEPAGE=y.
> > > > >>
> > > > >> It does this by effectively looking up a region that is of size,
> > > > >> request_size + PMD_SIZE, and aligning up the start to a PMD boundary.
> > > > >>
> > > > >> Commit 4ef9ad19e176 ("mm: huge_memory: don't force huge page alignment
> > > > >> on 32 bit") opted out of this for 32bit due to regressions in mmap base
> > > > >> randomization.
> > > > >>
> > > > >> Commit d4148aeab412 ("mm, mmap: limit THP alignment of anonymous
> > > > >> mappings to PMD-aligned sizes") restricted this to only mmap sizes that
> > > > >> are multiples of the PMD_SIZE due to reported regressions in some
> > > > >> performance benchmarks -- which seemed mostly due to the reduced spatial
> > > > >> locality of related mappings due to the forced PMD-alignment.
> > > > >>
> > > > >> Another unintended side effect has emerged: When a user specifies an mmap
> > > > >> hint address, the THP alignment logic modifies the behavior, potentially
> > > > >> ignoring the hint even if a sufficiently large gap exists at the requested
> > > > >> hint location.
> > > > >>
> > > > >> Example Scenario:
> > > > >>
> > > > >> Consider the following simplified virtual address (VA) space:
> > > > >>
> > > > >>     ...
> > > > >>
> > > > >>     0x200000-0x400000 --- VMA A
> > > > >>     0x400000-0x600000 --- Hole
> > > > >>     0x600000-0x800000 --- VMA B
> > > > >>
> > > > >>     ...
> > > > >>
> > > > >> A call to mmap() with hint=0x400000 and len=0x200000 behaves differently:
> > > > >>
> > > > >>   - Before THP alignment: The requested region (size 0x200000) fits into
> > > > >>     the gap at 0x400000, so the hint is respected.
> > > > >>
> > > > >>   - After alignment: The logic searches for a region of size
> > > > >>     0x400000 (len + PMD_SIZE) starting at 0x400000.
> > > > >>     This search fails due to the mapping at 0x600000 (VMA B), and the hint
> > > > >>     is ignored, falling back to arch_get_unmapped_area[_topdown]().
> > > >
> >
> > Hi all, Thanks for the reviews.
> >
> > > > Hmm looks like the search is not done in the optimal way regardless of
> > > > whether or not it ignores a hint - it should be able to find the hole, no?
> >
> > It's not able to find the hole in the example case because the size we
> > are looking for is now
> > (requested size + padding len) which is larger than the hole we have
> > at the hint address.
> >
> > > >
> > > > >> In general the hint is effectively ignored, if there is any
> > > > >> existing mapping in the below range:
> > > > >>
> > > > >>      [mmap_hint + mmap_size, mmap_hint + mmap_size + PMD_SIZE)
> > > > >>
> > > > >> This changes the semantics of mmap hint; from ""Respect the hint if a
> > > > >> sufficiently large gap exists at the requested location" to "Respect the
> > > > >> hint only if an additional PMD-sized gap exists beyond the requested size".
> > > > >>
> > > > >> This has performance implications for allocators that allocate their heap
> > > > >> using mmap but try to keep it "as contiguous as possible" by using the
> > > > >> end of the exisiting heap as the address hint. With the new behavior
> > > > >> it's more likely to get a much less contiguous heap, adding extra
> > > > >> fragmentation and performance overhead.
> > > > >>
> > > > >> To restore the expected behavior; don't use thp_get_unmapped_area_vmflags()
> > > > >> when the user provided a hint address.
> > > >
> > > > Agreed, the hint should take precendence.
> > > >
> > > > > Thanks for fixing it. I agree we should respect the hint address. But
> > > > > this patch actually just fixed anonymous mapping and the file mappings
> > > > > which don't support thp_get_unmapped_area(). So I think you should
> > > > > move the hint check to __thp_get_unmapped_area().
> > > > >
> > > > > And Vlastimil's fix d4148aeab412 ("mm, mmap: limit THP alignment of
> > > > > anonymous mappings to PMD-aligned sizes") should be moved to there too
> > > > > IMHO.
> > > >
> > > > This was brought up, but I didn't want to do it as part of the stable fix as
> > > > that would change even situations that Rik's change didn't.
> > > > If the mmap hint change is another stable hotfix, I wouldn't conflate it
> > > > either. But we can try it for further development. But careful about just
> > > > moving the code as-is, the file-based mappings are different than anonymous
> > > > memory and I believe file offsets matter:
> > > >
> > > > https://lore.kernel.org/all/9d7c73f6-1e1a-458b-93c6-3b44959022e0@xxxxxxx/
> > > >
> > > > https://lore.kernel.org/all/5f7a49e8-0416-4648-a704-a7a67e8cd894@xxxxxxx/
> > >
> >
> > I see, so I think we should keep the check here to address the
> > immediate regression for anonymous mappings.
> >
> > I believe what we would need to address this longer term is to have
> > vma_iter_lowest() [1] vma_iter_highest() [2] take into account the
> > alignment when doing the search. That way we don't need to inflate the
> > search size to facilitate the manual alignment after the fact. I
> > haven't looked too too deeply into this, maybe Liam has some ideas
> > about that?
> >
> > [1] https://github.com/torvalds/linux/blob/v6.12-rc7/mm/vma.h#L420
> >
> > [2] https://github.com/torvalds/linux/blob/v6.12-rc7/mm/vma.h#L426
> >
> > > Did some research about the history of the code, I found this commit:
> > >
> > > 97d3d0f9a1cf ("mm/huge_memory.c: thp: fix conflict of above-47bit hint
> > > address and PMD alignment"), it tried to fix "the function would not
> > > try to allocate PMD-aligned area if *any* hint address specified."
> > > It was for file mapping back then since anonymous mapping THP
> > > alignment was not supported yet.
> > >
> > > But it seems like this patch somehow tried to do something reverse. It
> > > may not be correct either.
> > >
> >
> > IIUC Kirill's patch is doing the right thing (mostly), i.e. it will
> > return the hint address (without any rounding to PMD alignment). The
> > case it doesn't handle is what I mentioned above, where we aren't able
> > to find the hole at the hint address in the first place because the
> > hole is smaller than (size + padding len)
>
> Yes. But my point is your fix (just simply skip PMD alignment when
> hint is specified) actually broke what Kirill fixed IIUC.

Hi Yang,

It's true the PMD alignment is skipped in that case for the anon
mappings. Though I believe that's still what we want to have here
initially as we shouldn't regress the hint behaviour.

I've posted the v2 here:
https://lore.kernel.org/lkml/20241118214650.3667577-1-kaleshsingh@xxxxxxxxxx/

>
> >
> > > With Vlastimis's fix, we just try to make the address THP aligned for
> > > anonymous mapping when the size is PMD aligned. So we don't need to
> > > take into account the padding for anonymous mapping anymore.
> > >
> >
> > We still need to have padding length because PMD alignment of the size
> > doesn't mean that the start address returned by the search will be PMD
> > aligned. Inherently those are only PAGE aligned.
>
> Yes, you are right, I overlooked this. I think we can do this in two
> passes. Use len w/o padding in the first pass. If the returned address
> equals the hint or it is already PMD aligned, just return it.
> Otherwise it means there is no hole with suitable size and alignment.
> In the second pass, we redo it with padding. Just off the top of my
> head, others may have better ideas.
>

You are right, it's one way we can do it. Though, I am concerned that
the 2 passes will add overhead on mmap() performance. One idea I have
is to move the alignment handling lower down to
vma_iter_highest()/lowest(). Interested to hear your thoughts on that.
We can do this in a follow up patch, which should fix file mappings as
well.

Thanks,
Kalesh

> >
> > Thanks,
> > Kalesh
> >
> > > So IIUC we should do something like:
> > >
> > > @@ -1085,7 +1085,11 @@ static unsigned long
> > > __thp_get_unmapped_area(struct file *filp,
> > >         if (off_end <= off_align || (off_end - off_align) < size)
> > >                 return 0;
> > >
> > > -       len_pad = len + size;
> > > +       if (filp)
> > > +               len_pad = len + size;
> > > +       else
> > > +               len_pad = len;
> > > +
> > >         if (len_pad < len || (off + len_pad) < off)
> > >                 return 0;
> > >
> > > >
> > > > >> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > > > >> Cc: Vlastimil Babka <vbabka@xxxxxxx>
> > > > >> Cc: Yang Shi <yang@xxxxxxxxxxxxxxxxxxxxxx>
> > > > >> Cc: Rik van Riel <riel@xxxxxxxxxxx>
> > > > >> Cc: Ryan Roberts <ryan.roberts@xxxxxxx>
> > > > >> Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx>
> > > > >> Cc: Minchan Kim <minchan@xxxxxxxxxx>
> > > > >> Cc: Hans Boehm <hboehm@xxxxxxxxxx>
> > > > >> Cc: Lokesh Gidra <lokeshgidra@xxxxxxxxxx>
> > > > >> Cc: <stable@xxxxxxxxxxxxxxx>
> > > > >> Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries")
> > > > >> Signed-off-by: Kalesh Singh <kaleshsingh@xxxxxxxxxx>
> > > >
> > > > Reviewed-by: Vlastimil Babka <vbabka@xxxxxxx>
> > > >
> > > > >> ---
> > > > >>  mm/mmap.c | 1 +
> > > > >>  1 file changed, 1 insertion(+)
> > > > >>
> > > > >> diff --git a/mm/mmap.c b/mm/mmap.c
> > > > >> index 79d541f1502b..2f01f1a8e304 100644
> > > > >> --- a/mm/mmap.c
> > > > >> +++ b/mm/mmap.c
> > > > >> @@ -901,6 +901,7 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> > > > >>         if (get_area) {
> > > > >>                 addr = get_area(file, addr, len, pgoff, flags);
> > > > >>         } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)
> > > > >> +                  && !addr /* no hint */
> > > > >>                    && IS_ALIGNED(len, PMD_SIZE)) {
> > > > >>                 /* Ensures that larger anonymous mappings are THP aligned. */
> > > > >>                 addr = thp_get_unmapped_area_vmflags(file, addr, len,
> > > > >>
> > > > >> base-commit: 2d5404caa8c7bb5c4e0435f94b28834ae5456623
> > > > >> --
> > > > >> 2.47.0.338.g60cca15819-goog
> > > > >>
> > > >





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux