On 5/14/20 7:31 AM, Shijie Hu wrote: > Here is a final patch to solve that hugetlb_get_unmapped_area() can't > get unmapped area below mmap base for huge pages based on a few previous > discussions and patches from me. > > I'm so sorry. When sending v2 and v3 patches, I forget to cc: > linux-mm@xxxxxxxxx and linux-kernel@xxxxxxxxxxxxxxx. No records of these > two patches found on the https://lkml.org/lkml/. > > Patch V1: https://lkml.org/lkml/2020/4/27/440 > Patch V4: https://lkml.org/lkml/2020/5/13/980 > > Changes in V2: > * Follow Mike's suggestions, move hugetlb_get_unmapped_area() routines > from "fs/hugetlbfs/inode.c" to "arch/arm64/mm/hugetlbpage.c", without > changing core code. > * Add mmap_is_legacy() function, and only fall back to the bottom-up > function on no-legacy mode. > > Changes in V3: > * Add *bottomup() and *topdown() two function, and check if > mm->get_unmapped_area is equal to arch_get_unmapped_area() instead of > checking mmap_is_legacy() to determine which function should be used. > > Changes in V4: > * Follow the suggestions of Will and Mike, move back this patch to core > code. > > Changes in V5: > * Fix kbuild test error. > > ------ > > In a 32-bit program, running on arm64 architecture. When the address > space below mmap base is completely exhausted, shmat() for huge pages > will return ENOMEM, but shmat() for normal pages can still success on > no-legacy mode. This seems not fair. > > For normal pages, get_unmapped_area() calling flows are: > => mm->get_unmapped_area() > if on legacy mode, > => arch_get_unmapped_area() > => vm_unmapped_area() > if on no-legacy mode, > => arch_get_unmapped_area_topdown() > => vm_unmapped_area() > > For huge pages, get_unmapped_area() calling flows are: > => file->f_op->get_unmapped_area() > => hugetlb_get_unmapped_area() > => vm_unmapped_area() > > To solve this issue, we only need to make hugetlb_get_unmapped_area() take > the same way as mm->get_unmapped_area(). Add *bottomup() and *topdown() > two functions, and check current mm->get_unmapped_area() to decide which > one to use. If mm->get_unmapped_area is equal to arch_get_unmapped_area(), > hugetlb_get_unmapped_area() calls bottomup routine, otherwise calls topdown > routine. > > Signed-off-by: Shijie Hu <hushijie3@xxxxxxxxxx> > Reported-by: kbuild test robot <lkp@xxxxxxxxx> > --- > fs/hugetlbfs/inode.c | 62 +++++++++++++++++++++++++++++++++++++++++++++------- > 1 file changed, 54 insertions(+), 8 deletions(-) > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > index 991c60c7ffe0..61418380f492 100644 > --- a/fs/hugetlbfs/inode.c > +++ b/fs/hugetlbfs/inode.c > @@ -38,6 +38,7 @@ > #include <linux/uio.h> > > #include <linux/uaccess.h> > +#include <linux/sched/mm.h> > > static const struct super_operations hugetlbfs_ops; > static const struct address_space_operations hugetlbfs_aops; > @@ -191,13 +192,60 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma) > > #ifndef HAVE_ARCH_HUGETLB_UNMAPPED_AREA > static unsigned long > +hugetlb_get_unmapped_area_bottomup(struct file *file, unsigned long addr, > + unsigned long len, unsigned long pgoff, unsigned long flags) > +{ > + struct hstate *h = hstate_file(file); > + struct vm_unmapped_area_info info; > + > + info.flags = 0; > + info.length = len; > + info.low_limit = current->mm->mmap_base; > + info.high_limit = TASK_SIZE; > + info.align_mask = PAGE_MASK & ~huge_page_mask(h); > + info.align_offset = 0; > + return vm_unmapped_area(&info); > +} > + > +static unsigned long > +hugetlb_get_unmapped_area_topdown(struct file *file, unsigned long addr, > + unsigned long len, unsigned long pgoff, unsigned long flags) > +{ > + struct hstate *h = hstate_file(file); > + struct vm_unmapped_area_info info; > + > + info.flags = VM_UNMAPPED_AREA_TOPDOWN; > + info.length = len; > + info.low_limit = max(PAGE_SIZE, mmap_min_addr); > + info.high_limit = current->mm->mmap_base; > + info.align_mask = PAGE_MASK & ~huge_page_mask(h); > + info.align_offset = 0; > + addr = vm_unmapped_area(&info); > + > + /* > + * A failed mmap() very likely causes application failure, > + * so fall back to the bottom-up function here. This scenario > + * can happen with large stack limits and large mmap() > + * allocations. > + */ > + if (unlikely(offset_in_page(addr))) { > + VM_BUG_ON(addr != -ENOMEM); > + info.flags = 0; > + info.low_limit = current->mm->mmap_base; > + info.high_limit = TASK_SIZE; > + addr = vm_unmapped_area(&info); > + } > + > + return addr; > +} > + > +static unsigned long > hugetlb_get_unmapped_area(struct file *file, unsigned long addr, > unsigned long len, unsigned long pgoff, unsigned long flags) > { > struct mm_struct *mm = current->mm; > struct vm_area_struct *vma; > struct hstate *h = hstate_file(file); > - struct vm_unmapped_area_info info; > > if (len & ~huge_page_mask(h)) > return -EINVAL; > @@ -218,13 +266,11 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, > return addr; > } > > - info.flags = 0; > - info.length = len; > - info.low_limit = TASK_UNMAPPED_BASE; > - info.high_limit = TASK_SIZE; > - info.align_mask = PAGE_MASK & ~huge_page_mask(h); > - info.align_offset = 0; > - return vm_unmapped_area(&info); > + if (mm->get_unmapped_area == arch_get_unmapped_area) > + return hugetlb_get_unmapped_area_bottomup(file, addr, len, > + pgoff, flags); > + return hugetlb_get_unmapped_area_topdown(file, addr, len, > + pgoff, flags); I like this code using the value of mm->get_unmapped_area to determine which routine to call. It is used by a few architectures. However, I noticed that on at least one architecture (powerpc) mm->get_unmapped_area may be assigned to routines other than arch_get_unmapped_area or arch_get_unmapped_area_topdown. In such a case, we would call the 'new' topdown routine. I would prefer that we call the bottomup routine in this default case. In reality, this does not impact powerpc as that architecture has it's own hugetlb_get_unmapped_area routine. Because of this, I suggest we add a comment above this code and switch the if/else order. For example, + /* + * Use mm->get_unmapped_area value as a hint to use topdown routine. + * If architectures have special needs, they should define their own + * version of hugetlb_get_unmapped_area. + */ + if (mm->get_unmapped_area == arch_get_unmapped_area_topdown) + return hugetlb_get_unmapped_area_topdown(file, addr, len, + pgoff, flags); + return hugetlb_get_unmapped_area_bottomup(file, addr, len, + pgoff, flags); Thoughts? -- Mike Kravetz > } > #endif > >