On Mon, Nov 18, 2024 at 1:44 PM Yang Shi <shy828301@xxxxxxxxx> wrote: > > On Mon, Nov 18, 2024 at 9:52 AM Kalesh Singh <kaleshsingh@xxxxxxxxxx> wrote: > > > > On Mon, Nov 18, 2024 at 9:05 AM Yang Shi <shy828301@xxxxxxxxx> wrote: > > > > > > On Sun, Nov 17, 2024 at 3:12 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > > > > > > > On 11/15/24 23:15, Yang Shi wrote: > > > > > On Fri, Nov 15, 2024 at 1:52 PM Kalesh Singh <kaleshsingh@xxxxxxxxxx> wrote: > > > > >> > > > > >> Commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP > > > > >> boundaries") updated __get_unmapped_area() to align the start address > > > > >> for the VMA to a PMD boundary if CONFIG_TRANSPARENT_HUGEPAGE=y. > > > > >> > > > > >> It does this by effectively looking up a region that is of size, > > > > >> request_size + PMD_SIZE, and aligning up the start to a PMD boundary. > > > > >> > > > > >> Commit 4ef9ad19e176 ("mm: huge_memory: don't force huge page alignment > > > > >> on 32 bit") opted out of this for 32bit due to regressions in mmap base > > > > >> randomization. > > > > >> > > > > >> Commit d4148aeab412 ("mm, mmap: limit THP alignment of anonymous > > > > >> mappings to PMD-aligned sizes") restricted this to only mmap sizes that > > > > >> are multiples of the PMD_SIZE due to reported regressions in some > > > > >> performance benchmarks -- which seemed mostly due to the reduced spatial > > > > >> locality of related mappings due to the forced PMD-alignment. > > > > >> > > > > >> Another unintended side effect has emerged: When a user specifies an mmap > > > > >> hint address, the THP alignment logic modifies the behavior, potentially > > > > >> ignoring the hint even if a sufficiently large gap exists at the requested > > > > >> hint location. > > > > >> > > > > >> Example Scenario: > > > > >> > > > > >> Consider the following simplified virtual address (VA) space: > > > > >> > > > > >> ... > > > > >> > > > > >> 0x200000-0x400000 --- VMA A > > > > >> 0x400000-0x600000 --- Hole > > > > >> 0x600000-0x800000 --- VMA B > > > > >> > > > > >> ... > > > > >> > > > > >> A call to mmap() with hint=0x400000 and len=0x200000 behaves differently: > > > > >> > > > > >> - Before THP alignment: The requested region (size 0x200000) fits into > > > > >> the gap at 0x400000, so the hint is respected. > > > > >> > > > > >> - After alignment: The logic searches for a region of size > > > > >> 0x400000 (len + PMD_SIZE) starting at 0x400000. > > > > >> This search fails due to the mapping at 0x600000 (VMA B), and the hint > > > > >> is ignored, falling back to arch_get_unmapped_area[_topdown](). > > > > > > > > Hi all, Thanks for the reviews. > > > > > > Hmm looks like the search is not done in the optimal way regardless of > > > > whether or not it ignores a hint - it should be able to find the hole, no? > > > > It's not able to find the hole in the example case because the size we > > are looking for is now > > (requested size + padding len) which is larger than the hole we have > > at the hint address. > > > > > > > > > > >> In general the hint is effectively ignored, if there is any > > > > >> existing mapping in the below range: > > > > >> > > > > >> [mmap_hint + mmap_size, mmap_hint + mmap_size + PMD_SIZE) > > > > >> > > > > >> This changes the semantics of mmap hint; from ""Respect the hint if a > > > > >> sufficiently large gap exists at the requested location" to "Respect the > > > > >> hint only if an additional PMD-sized gap exists beyond the requested size". > > > > >> > > > > >> This has performance implications for allocators that allocate their heap > > > > >> using mmap but try to keep it "as contiguous as possible" by using the > > > > >> end of the exisiting heap as the address hint. With the new behavior > > > > >> it's more likely to get a much less contiguous heap, adding extra > > > > >> fragmentation and performance overhead. > > > > >> > > > > >> To restore the expected behavior; don't use thp_get_unmapped_area_vmflags() > > > > >> when the user provided a hint address. > > > > > > > > Agreed, the hint should take precendence. > > > > > > > > > Thanks for fixing it. I agree we should respect the hint address. But > > > > > this patch actually just fixed anonymous mapping and the file mappings > > > > > which don't support thp_get_unmapped_area(). So I think you should > > > > > move the hint check to __thp_get_unmapped_area(). > > > > > > > > > > And Vlastimil's fix d4148aeab412 ("mm, mmap: limit THP alignment of > > > > > anonymous mappings to PMD-aligned sizes") should be moved to there too > > > > > IMHO. > > > > > > > > This was brought up, but I didn't want to do it as part of the stable fix as > > > > that would change even situations that Rik's change didn't. > > > > If the mmap hint change is another stable hotfix, I wouldn't conflate it > > > > either. But we can try it for further development. But careful about just > > > > moving the code as-is, the file-based mappings are different than anonymous > > > > memory and I believe file offsets matter: > > > > > > > > https://lore.kernel.org/all/9d7c73f6-1e1a-458b-93c6-3b44959022e0@xxxxxxx/ > > > > > > > > https://lore.kernel.org/all/5f7a49e8-0416-4648-a704-a7a67e8cd894@xxxxxxx/ > > > > > > > I see, so I think we should keep the check here to address the > > immediate regression for anonymous mappings. > > > > I believe what we would need to address this longer term is to have > > vma_iter_lowest() [1] vma_iter_highest() [2] take into account the > > alignment when doing the search. That way we don't need to inflate the > > search size to facilitate the manual alignment after the fact. I > > haven't looked too too deeply into this, maybe Liam has some ideas > > about that? > > > > [1] https://github.com/torvalds/linux/blob/v6.12-rc7/mm/vma.h#L420 > > > > [2] https://github.com/torvalds/linux/blob/v6.12-rc7/mm/vma.h#L426 > > > > > Did some research about the history of the code, I found this commit: > > > > > > 97d3d0f9a1cf ("mm/huge_memory.c: thp: fix conflict of above-47bit hint > > > address and PMD alignment"), it tried to fix "the function would not > > > try to allocate PMD-aligned area if *any* hint address specified." > > > It was for file mapping back then since anonymous mapping THP > > > alignment was not supported yet. > > > > > > But it seems like this patch somehow tried to do something reverse. It > > > may not be correct either. > > > > > > > IIUC Kirill's patch is doing the right thing (mostly), i.e. it will > > return the hint address (without any rounding to PMD alignment). The > > case it doesn't handle is what I mentioned above, where we aren't able > > to find the hole at the hint address in the first place because the > > hole is smaller than (size + padding len) > > Yes. But my point is your fix (just simply skip PMD alignment when > hint is specified) actually broke what Kirill fixed IIUC. Hi Yang, It's true the PMD alignment is skipped in that case for the anon mappings. Though I believe that's still what we want to have here initially as we shouldn't regress the hint behaviour. I've posted the v2 here: https://lore.kernel.org/lkml/20241118214650.3667577-1-kaleshsingh@xxxxxxxxxx/ > > > > > > With Vlastimis's fix, we just try to make the address THP aligned for > > > anonymous mapping when the size is PMD aligned. So we don't need to > > > take into account the padding for anonymous mapping anymore. > > > > > > > We still need to have padding length because PMD alignment of the size > > doesn't mean that the start address returned by the search will be PMD > > aligned. Inherently those are only PAGE aligned. > > Yes, you are right, I overlooked this. I think we can do this in two > passes. Use len w/o padding in the first pass. If the returned address > equals the hint or it is already PMD aligned, just return it. > Otherwise it means there is no hole with suitable size and alignment. > In the second pass, we redo it with padding. Just off the top of my > head, others may have better ideas. > You are right, it's one way we can do it. Though, I am concerned that the 2 passes will add overhead on mmap() performance. One idea I have is to move the alignment handling lower down to vma_iter_highest()/lowest(). Interested to hear your thoughts on that. We can do this in a follow up patch, which should fix file mappings as well. Thanks, Kalesh > > > > Thanks, > > Kalesh > > > > > So IIUC we should do something like: > > > > > > @@ -1085,7 +1085,11 @@ static unsigned long > > > __thp_get_unmapped_area(struct file *filp, > > > if (off_end <= off_align || (off_end - off_align) < size) > > > return 0; > > > > > > - len_pad = len + size; > > > + if (filp) > > > + len_pad = len + size; > > > + else > > > + len_pad = len; > > > + > > > if (len_pad < len || (off + len_pad) < off) > > > return 0; > > > > > > > > > > > >> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > > > >> Cc: Vlastimil Babka <vbabka@xxxxxxx> > > > > >> Cc: Yang Shi <yang@xxxxxxxxxxxxxxxxxxxxxx> > > > > >> Cc: Rik van Riel <riel@xxxxxxxxxxx> > > > > >> Cc: Ryan Roberts <ryan.roberts@xxxxxxx> > > > > >> Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx> > > > > >> Cc: Minchan Kim <minchan@xxxxxxxxxx> > > > > >> Cc: Hans Boehm <hboehm@xxxxxxxxxx> > > > > >> Cc: Lokesh Gidra <lokeshgidra@xxxxxxxxxx> > > > > >> Cc: <stable@xxxxxxxxxxxxxxx> > > > > >> Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") > > > > >> Signed-off-by: Kalesh Singh <kaleshsingh@xxxxxxxxxx> > > > > > > > > Reviewed-by: Vlastimil Babka <vbabka@xxxxxxx> > > > > > > > > >> --- > > > > >> mm/mmap.c | 1 + > > > > >> 1 file changed, 1 insertion(+) > > > > >> > > > > >> diff --git a/mm/mmap.c b/mm/mmap.c > > > > >> index 79d541f1502b..2f01f1a8e304 100644 > > > > >> --- a/mm/mmap.c > > > > >> +++ b/mm/mmap.c > > > > >> @@ -901,6 +901,7 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, > > > > >> if (get_area) { > > > > >> addr = get_area(file, addr, len, pgoff, flags); > > > > >> } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) > > > > >> + && !addr /* no hint */ > > > > >> && IS_ALIGNED(len, PMD_SIZE)) { > > > > >> /* Ensures that larger anonymous mappings are THP aligned. */ > > > > >> addr = thp_get_unmapped_area_vmflags(file, addr, len, > > > > >> > > > > >> base-commit: 2d5404caa8c7bb5c4e0435f94b28834ae5456623 > > > > >> -- > > > > >> 2.47.0.338.g60cca15819-goog > > > > >> > > > >