On Thu, Jul 11, 2024 at 07:32:23PM GMT, Haoran Jiang wrote: > On Wed, 2024-07-10 at 09:24 +0100, Lorenzo Stoakes wrote: > > On Wed, Jul 10, 2024 at 01:45:58PM GMT, Haoran Jang wrote: > > > From: Haoran Jiang <jianghaoran@xxxxxxxxxx> > > > > > > munmap hugepge mappings, if the length of the range to munmap > > > is not aligned with hugepage size,munmap will fail. > > > In the hugetlb_vm_op_split function, an error will be returned > > > if startaddr+len is not hugepage size aligned. > > > > > > before this patch: > > > in "tools/testing/selftests/mm/hugepage-mremap.c" > > > modify DEFAULT_LENGTH_MB to 3M,compile and run, > > > the following error message is displayed > > > > > > ------------------------- > > > running ./hugepage-mremap > > > ------------------------- > > > TAP version 13 > > > 1..1 > > > Map haddr: Returned address is 0x7eaa40000000 > > > Map daddr: Returned address is 0x7daa40000000 > > > Map vaddr: Returned address is 0x7faa40000000 > > > Address returned by mmap() = 0x7cb34b000000 > > > Mremap: Returned address is 0x7faa40000000 > > > First hex is 0 > > > First hex is 3020100 > > > Bail out! mremap: Expected failure, but call succeeded > > > > > > Signed-off-by: Haoran Jiang <jianghaoran@xxxxxxxxxx> > > > --- > > > mm/mmap.c | 10 +++++++++- > > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > index 83b4682ec85c..0b3a60bf9b6f 100644 > > > --- a/mm/mmap.c > > > +++ b/mm/mmap.c > > > @@ -2733,7 +2733,15 @@ int do_vmi_munmap(struct vma_iterator *vmi, > > > struct mm_struct *mm, > > > if ((offset_in_page(start)) || start > TASK_SIZE || len > > > > TASK_SIZE-start) > > > return -EINVAL; > > > > > > - end = start + PAGE_ALIGN(len); > > > + vma = find_vma(mm, start); > > > + if (!vma) { > > > + if (unlock) > > > + mmap_write_unlock(mm); > > > + return 0; > > > + } > > > > I really don't like this, firstly we're duplicating the VMA lookup > > (we > > vma_find() below), and we fail to use the iterator here, and also we > > are > > duplicating the unlock logic. > > > > Also the semantics seem wrong, we are looking for a VMA that ends at > > or > > after start, so you're just checking to see if start is past the last > > VMA > > in the mm aren't you? > > > > This doesn't seem to be accomplishing anything too useful, unless I'm > > missing something? > > > > > + > > > + end = start + ALIGN(len, vma_kernel_pagesize(vma)); > > > + > > > > This seems to be the 'action' part of the change, but I'm concerned > > this is > > completely broken, because you're using the result of find_vma() > > passed > > into vma_kernel_pagesize() which could find a VMA _after_ the input > > range, > > and end up unmapping a far wider range... > > > > I'm also wondering if we should be doing some hugetlb-specific logic > > here, > > or whether that belongs elsewhere? > > > > Liam can chime in on that. > > > > > if (end == start) > > > return -EINVAL; > > > > > > -- > > > 2.43.0 > > > > 1, While performing an MMAP operation,The length aligned with hugepage > size. > > unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len, > unsigned long prot, unsigned long flags, > unsigned long fd, unsigned long pgoff) > { > ... > if (is_file_hugepages(file)) { > len = ALIGN(len, huge_page_size(hstate_file(file))); > ... > } > > 2,During the munmap, do_vmi_align_munmap->__split_vma(vmi, next, end, > 0)->hugetlb_vm_op_split.It will determine whether the end address is > aligned with hugepage size, and if the end address is not aligned, > return fail. Is there expect the application to align the length? > > hugetlb_vm_op_split(struct vm_area_struct *vma, unsigned long addr) > { > if (addr & ~(huge_page_mask(hstate_vma(vma)))) > return -EINVAL; > > } > > 3,Or after the vma_find (vmi, end), recalculate the end address ? > > ex: > vma = vma_find(vmi, end); > ... > if (is_vm_hugetlb_page(vma)) > { > hugepage_size = huge_page_size(hstate_vma(vma)); > end = start + ALIGN(len, hugepage_size); > } I'm confused as to what you're saying here? I'm addressing your original patch, which was flawed in that it could end up unmapping the incorrect VMA. I guess you're basically asking whether generally we expect applications to be aligned to huge page size? According to the man page: Huge page (Huge TLB) mappings For mappings that employ huge pages, the requirements for the arguments of mmap() and munmap() differ somewhat from the requirements for mappings that use the native system page size. For mmap(), offset must be a multiple of the underlying huge page size. The system automatically aligns length to be a multiple of the underlying huge page size. For munmap(), addr, and length must both be a multiple of the underlying huge page size. So yeah, I think this kills this patch unfortunately. This is expected behaviour and your adjustment of the test is invalid. I do understand why you wanted to adjust this though, but it seems that we explicitly want this behaviour, so sorry about that, but thank you for taking a look at this regardless! :) I would suggest that it's probably a product of there being quite a large difference between rounding up to base page size (which is expected and customary for these kinds of operations) vs. treating a 1 byte length as meaning you want to free, for instance, 2 MiB of data. Equally, it would otherwise cause some confusion around THP pages which are huge but can be split. So sadly I think this is a nack for the patch/concept.