Re: [PATCH v2] mm, hugepages: add mremap() support for hugepage backed vma

Mike Kravetz <mike.kravetz@xxxxxxxxxx> · Fri, 20 Aug 2021 14:00:53 -0700

On 8/18/21 4:35 PM, Mina Almasry wrote:
> On Fri, Aug 13, 2021 at 4:40 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
>> Earlier in mremap code, this following lines exist:
>>
>>         old_len = PAGE_ALIGN(old_len);
>>         new_len = PAGE_ALIGN(new_len);
>>
>> So, the passed length values are page aligned.  This allows 'sloppy'
>> values to be passed by users.
>>
>> Should we do the same for hugetlb mappings?  In mmap we have different
>> requirements for hugetlb mappings:
>>
>> " Huge page (Huge TLB) mappings
>>        For mappings that employ huge pages, the requirements for the arguments
>>        of  mmap()  and munmap() differ somewhat from the requirements for map‐
>>        pings that use the native system page size.
>>
>>        For mmap(), offset must be a multiple of the underlying huge page size.
>>        The system automatically aligns length to be a multiple of the underly‐
>>        ing huge page size.
>>
>>        For munmap(), addr and length must both be a multiple of the underlying
>>        huge page size.
>> "
>>
>> I actually wish arguments for hugetlb mappings would be treated the same
>> as for base page size mappings.  We can not change mmap as legacy code
>> may depend on the different requirements.  Since mremap for hugetlb is
>> new, should we treat arguments for hugetlb mappings the same as for base
>> pages (align to huge page boundary)?  My vote is yes, but it would be
>> good to get other opinions.
>>
>> If we do not align for hugetlb mappings as we do for base page mappings,
>> then this will also need to be documented.
>>
>> Another question,
>> Should we possibly check addr and new_addr alignment here as well?
>> addr has been previously checked for PAGE alignment and new_addr is
>> checked for PAGE alignment at the beginning of mremap_to().
>>
> 
> I'll yield to whatever you decide here because I reckon you have much
> more experience and better judgement here. But my thoughts:
> 
> 'Sane' usage of mremap() is something like:
> 1. mmap() a hugetlbfs vma.
> 2. Pass the vma received from step (1) to mremap() to remap it to a
> different location.
> 
> I don't know if there is another usage pattern I need to worry about
> but given the above, old_addr and old_len will be hugepage aligned
> already since they are values returned by the previous mmap() call
> which aligns them, no? So, I think aligning old_addr and old_len to
> the hugepage boundary is fine.
> 
> With this support we don't allow mremap() expansion. In my use case
> old_len==new_len acutally. I think it's fine to also align new_len to
> the hugepage boundary
> 
> I already have this code that errors out if the lengths are not aligned:
> 
> if (old_len & ~huge_page_mask(h) || new_len & ~huge_page_mask(h))
>     goto out;
> 
> I think aligning new_addr breaks my use case though. In my use case
> new_addr is the start of the text segment in the ELF executable, and I
> don't think that's guaranteed to be anything but page aligned.
> Aligning new_addr seems like it would break my use case.

That is interesting.  I assumed there was hugetlb code written under the
assumption vmas/mappings were always huge page aligned.  I thought the
code would fall over quite quickly if vma was not huge page aligned.

Your use case/statement above surprised me.

So, I took your provided test case (V3 patch)and tried to make destination
address be non-huge page aligned: just page aligned.  In every case, mremap
would fail.  The routine hugetlb_get_unmapped_area() required huge page
alignment.  Not sure how this works for you?

> Aligning new_addr seems like it would break my use case. If you insist
> though I'm happy aligning new_addr in the upstream kernel and not
> doing that in our kernel, but if I'm not particularly happy with the
> hugepage alignment I'd say it is likely future users of hugetlb
> mremap() also won't like the hugepage alignement, but I yield to you
> here.

I am now a bit confused and do not see how this works for your use case?
-- 
Mike Kravetz