On Tue, Oct 22, 2019 at 11:41:57AM -0700, John Hubbard wrote: > On 10/22/19 10:14 AM, Jerome Glisse wrote: > > On Mon, Oct 21, 2019 at 02:24:35PM -0700, John Hubbard wrote: > >> The MAP_HUGETLB ("-H" option) of gup_benchmark fails: > >> > >> $ sudo ./gup_benchmark -H > >> mmap: Invalid argument > >> > >> This is because gup_benchmark.c is passing in a file descriptor to > >> mmap(), but the fd came from opening up the /dev/zero file. This > >> confuses the mmap syscall implementation, which thinks that, if the > >> caller did not specify MAP_ANONYMOUS, then the file must be a huge > >> page file. So it attempts to verify that the file really is a huge > >> page file, as you can see here: > >> > >> ksys_mmap_pgoff() > >> { > >> if (!(flags & MAP_ANONYMOUS)) { > >> retval = -EINVAL; > >> if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file))) > >> goto out_fput; /* THIS IS WHERE WE END UP */ > >> > >> else if (flags & MAP_HUGETLB) { > >> ...proceed normally, /dev/zero is ok here... > >> > >> ...and of course is_file_hugepages() returns "false" for the /dev/zero > >> file. > >> > >> The problem is that the user space program, gup_benchmark.c, really just > >> wants anonymous memory here. The simplest way to get that is to pass > >> MAP_ANONYMOUS whenever MAP_HUGETLB is specified, so that's what this > >> patch does. > > > > This looks wrong, MAP_HUGETLB should only be use to create vma > > for hugetlbfs. If you want anonymous private vma do not set the > > MAP_HUGETLB. If you want huge page inside your anonymous vma > > there is nothing to do at the mmap time, this is the job of the > > transparent huge page code (THP). > > > > Not the point. Please look more closely at ksys_mmap_pgoff(). You'll > see that, since 2009 (and probably earlier; 2009 is just when Hugh Dickens > moved it over from util.c), this routine has had full support for using > hugetlbfs automatically, via mmap. > > It does that via hugetlb_file_setup(): > > unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len, > unsigned long prot, unsigned long flags, > unsigned long fd, unsigned long pgoff) > { > ... > if (!(flags & MAP_ANONYMOUS)) { > ... > } else if (flags & MAP_HUGETLB) { > struct user_struct *user = NULL; > struct hstate *hs; > > hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK); > if (!hs) > return -EINVAL; > > len = ALIGN(len, huge_page_size(hs)); > /* > * VM_NORESERVE is used because the reservations will be > * taken when vm_ops->mmap() is called > * A dummy user value is used because we are not locking > * memory so no accounting is necessary > */ > file = hugetlb_file_setup(HUGETLB_ANON_FILE, len, > VM_NORESERVE, > &user, HUGETLB_ANONHUGE_INODE, > (flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK); > if (IS_ERR(file)) > return PTR_ERR(file); > } > ... > > > Also, there are 14 (!) other pre-existing examples of passing > MAP_HUGETLB | MAP_ANONYMOUS to mmap, so I'm not exactly the first one > to reach this understanding. > > > > NAK as misleading > > Ouch. But I think I'm actually leading correctly, rather than misleading. > Can you prove me wrong? :) So i was misslead by the file descriptor, passing a file descriptor and asking for anonymous always bugs me. But yeah the _linux_ kernel is happy to ignore the file argument if you set the anonymous flag. I guess the rules of passing -1 for fd when anonymous is just engrave in my brain. Also i thought that the file was an argument of the test and thus that for huge you needed to pass a hugetlbfs' file. Anyway my mistake, you are right, you can pass a file and ask for anonymous and hugetlb at the same time. Reviewed-by: Jérôme Glisse <jglisse@xxxxxxxxxx>