>On 5/14/20 7:31 AM, Shijie Hu wrote: >> Here is a final patch to solve that hugetlb_get_unmapped_area() can't >> get unmapped area below mmap base for huge pages based on a few previous >> discussions and patches from me. >> >> I'm so sorry. When sending v2 and v3 patches, I forget to cc: >> linux-mm@xxxxxxxxx and linux-kernel@xxxxxxxxxxxxxxx. No records of these >> two patches found on the https://lkml.org/lkml/. >> >> Patch V1: https://lkml.org/lkml/2020/4/27/440 >> Patch V4: https://lkml.org/lkml/2020/5/13/980 >> >> Changes in V2: >> * Follow Mike's suggestions, move hugetlb_get_unmapped_area() routines >> from "fs/hugetlbfs/inode.c" to "arch/arm64/mm/hugetlbpage.c", without >> changing core code. >> * Add mmap_is_legacy() function, and only fall back to the bottom-up >> function on no-legacy mode. >> >> Changes in V3: >> * Add *bottomup() and *topdown() two function, and check if >> mm->get_unmapped_area is equal to arch_get_unmapped_area() instead of >> checking mmap_is_legacy() to determine which function should be used. >> >> Changes in V4: >> * Follow the suggestions of Will and Mike, move back this patch to core >> code. >> >> Changes in V5: >> * Fix kbuild test error. >> >> ------ >> >> In a 32-bit program, running on arm64 architecture. When the address >> space below mmap base is completely exhausted, shmat() for huge pages >> will return ENOMEM, but shmat() for normal pages can still success on >> no-legacy mode. This seems not fair. >> >> For normal pages, get_unmapped_area() calling flows are: >> => mm->get_unmapped_area() >> if on legacy mode, >> => arch_get_unmapped_area() >> => vm_unmapped_area() >> if on no-legacy mode, >> => arch_get_unmapped_area_topdown() >> => vm_unmapped_area() >> >> For huge pages, get_unmapped_area() calling flows are: >> => file->f_op->get_unmapped_area() >> => hugetlb_get_unmapped_area() >> => vm_unmapped_area() >> >> To solve this issue, we only need to make hugetlb_get_unmapped_area() take >> the same way as mm->get_unmapped_area(). Add *bottomup() and *topdown() >> two functions, and check current mm->get_unmapped_area() to decide which >> one to use. If mm->get_unmapped_area is equal to arch_get_unmapped_area(), >> hugetlb_get_unmapped_area() calls bottomup routine, otherwise calls topdown >> routine. >> >> Signed-off-by: Shijie Hu <hushijie3@xxxxxxxxxx> >> Reported-by: kbuild test robot <lkp@xxxxxxxxx> >> --- >> fs/hugetlbfs/inode.c | 62 +++++++++++++++++++++++++++++++++++++++++++++------- >> 1 file changed, 54 insertions(+), 8 deletions(-) >> >> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c >> index 991c60c7ffe0..61418380f492 100644 >> --- a/fs/hugetlbfs/inode.c >> +++ b/fs/hugetlbfs/inode.c >> @@ -38,6 +38,7 @@ >> #include <linux/uio.h> >> >> #include <linux/uaccess.h> >> +#include <linux/sched/mm.h> >> >> static const struct super_operations hugetlbfs_ops; >> static const struct address_space_operations hugetlbfs_aops; >> @@ -191,13 +192,60 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma) >> >> #ifndef HAVE_ARCH_HUGETLB_UNMAPPED_AREA >> static unsigned long >> +hugetlb_get_unmapped_area_bottomup(struct file *file, unsigned long addr, >> + unsigned long len, unsigned long pgoff, unsigned long flags) >> +{ >> + struct hstate *h = hstate_file(file); >> + struct vm_unmapped_area_info info; >> + >> + info.flags = 0; >> + info.length = len; >> + info.low_limit = current->mm->mmap_base; >> + info.high_limit = TASK_SIZE; >> + info.align_mask = PAGE_MASK & ~huge_page_mask(h); >> + info.align_offset = 0; >> + return vm_unmapped_area(&info); >> +} >> + >> +static unsigned long >> +hugetlb_get_unmapped_area_topdown(struct file *file, unsigned long addr, >> + unsigned long len, unsigned long pgoff, unsigned long flags) >> +{ >> + struct hstate *h = hstate_file(file); >> + struct vm_unmapped_area_info info; >> + >> + info.flags = VM_UNMAPPED_AREA_TOPDOWN; >> + info.length = len; >> + info.low_limit = max(PAGE_SIZE, mmap_min_addr); >> + info.high_limit = current->mm->mmap_base; >> + info.align_mask = PAGE_MASK & ~huge_page_mask(h); >> + info.align_offset = 0; >> + addr = vm_unmapped_area(&info); >> + >> + /* >> + * A failed mmap() very likely causes application failure, >> + * so fall back to the bottom-up function here. This scenario >> + * can happen with large stack limits and large mmap() >> + * allocations. >> + */ >> + if (unlikely(offset_in_page(addr))) { >> + VM_BUG_ON(addr != -ENOMEM); >> + info.flags = 0; >> + info.low_limit = current->mm->mmap_base; >> + info.high_limit = TASK_SIZE; >> + addr = vm_unmapped_area(&info); >> + } >> + >> + return addr; >> +} >> + >> +static unsigned long >> hugetlb_get_unmapped_area(struct file *file, unsigned long addr, >> unsigned long len, unsigned long pgoff, unsigned long flags) >> { >> struct mm_struct *mm = current->mm; >> struct vm_area_struct *vma; >> struct hstate *h = hstate_file(file); >> - struct vm_unmapped_area_info info; >> >> if (len & ~huge_page_mask(h)) >> return -EINVAL; >> @@ -218,13 +266,11 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, >> return addr; >> } >> >> - info.flags = 0; >> - info.length = len; >> - info.low_limit = TASK_UNMAPPED_BASE; >> - info.high_limit = TASK_SIZE; >> - info.align_mask = PAGE_MASK & ~huge_page_mask(h); >> - info.align_offset = 0; >> - return vm_unmapped_area(&info); >> + if (mm->get_unmapped_area == arch_get_unmapped_area) >> + return hugetlb_get_unmapped_area_bottomup(file, addr, len, >> + pgoff, flags); >> + return hugetlb_get_unmapped_area_topdown(file, addr, len, >> + pgoff, flags); > >I like this code using the value of mm->get_unmapped_area to determine >which routine to call. It is used by a few architectures. However, I >noticed that on at least one architecture (powerpc) mm->get_unmapped_area >may be assigned to routines other than arch_get_unmapped_area or >arch_get_unmapped_area_topdown. In such a case, we would call the 'new' >topdown routine. I would prefer that we call the bottomup routine in this >default case. > >In reality, this does not impact powerpc as that architecture has it's >own hugetlb_get_unmapped_area routine. > Yes, I also noticed this before, powerpc uses radix__arch_get_unmapped_area*() when CONFIG_PPC_RADIX_MMU opened as 'y' and radix_enabled() returns true. However, powerpc implemented its own hugetlb_get_unmapped_area(). This patch actually has no effect on powerpc. >Because of this, I suggest we add a comment above this code and switch >the if/else order. For example, > >+ /* >+ * Use mm->get_unmapped_area value as a hint to use topdown routine. >+ * If architectures have special needs, they should define their own >+ * version of hugetlb_get_unmapped_area. >+ */ >+ if (mm->get_unmapped_area == arch_get_unmapped_area_topdown) >+ return hugetlb_get_unmapped_area_topdown(file, addr, len, >+ pgoff, flags); >+ return hugetlb_get_unmapped_area_bottomup(file, addr, len, >+ pgoff, flags); > >Thoughts? >-- >Mike Kravetz > I agree with you. It's clever to switch the if/else order. If there is such a case, mm->get_unmapped_area() is neihter arch_get_unmapped_area() nor arch_get_unmapped_area_topdown(), it is indeed more appropriate to make the bottomup routine as the default behavior. May I put this code and comment you show above into patch v6 and add "Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>" to it? -- Shijie Hu > >> } >> #endif >> >>