Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/26/2015 07:56 AM, Davide Libenzi wrote:
> On Wed, 25 Mar 2015, David Rientjes wrote:
> 
>> I looked at this thread at http://marc.info/?t=141392508800001
>> since I didn't have it in my mailbox, and I didn't get a chance
>> to actually run your test code.
>> 
>> In short, I think what you're saying is that
>> 
>> ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...) munmap(ptr,
>> 4KB) == EINVAL
> 
> I am not sure you have read the email correctly:
> 
> munmap(mmap(size, HUGETLB), size) = EFAIL
> 
> For every size not multiple of the huge page size. Whereas:
> 
> munmap(mmap(size, HUGETLB), ALIGN(size, HUGEPAGE_SIZE)) = OK

I think Davide is right here, this is a long existing bug in the
MAP_HUGETLB implementation.  Specifically, the mmap man page says:

All pages containing a part of the indicated range are unmapped, and
subsequent references to these pages will generate SIGSEGV.

I realize that huge pages may not have been considered by those that
wrote the spec.  But if I read this I would assume that all pages,
regardless of size, touched by the munmap() request should be unmapped.

Please include
Acked-by: Eric B Munson <emunson@xxxxxxxxxx>
to the original patch.  I would like to see the mmap man page adjusted
to make note of this behavior as well.

> 
> 
>> Respecting the mmap(2) POSIX specification?  I don't think 
>> mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...) mapping 2MB violates
>>  POSIX.1-2001 and not only because it obviously doesn't address 
>> MAP_HUGETLB, but I don't think the spec says the system cannot
>> map more memory than len.
>> 
>> Using MAP_HUGETLB is really more a library function than anything
>> else since you could easily implement the same behavior in a
>> library.  That function includes aligning len to the hugepage
>> size, so doing
>> 
>> ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...)
>> 
>> is the equivalent to doing
>> 
>> ptr = mmap(..., hugepage_size, ..., MAP_HUGETLB | ..., ...)
>> 
>> and that doesn't violate any spec.  But your patch doesn't change
>> mmap() at all, so let's forget about that.
> 
> That is what every mmap() implementation does, irrespectively of
> any page size. And that is also what the POSIX spec states. The
> size will be automatically rounded up to a multiple of the
> underline physical page size. The problem is not mmap() though, in
> this case.
> 
> 
>> The question you pose is whether munmap(ptr, 4KB) should succeed
>> for a hugetlb vma and in your patch you align this to the
>> hugepage size of the vma in the same manner that munmap(ptr, 2KB)
>> would be aligned to PAGE_SIZE for a non-hugetlb vma.
>> 
>> The munmap() spec says the whole pages that include any part of
>> the passed length should be unmapped.  In spirit, I would agree
>> with you that the page size for the vma is the hugepage size so
>> that would be what would be unmapped.
>> 
>> But that's going by a spec that doesn't address hugepages and is
>> worded in a way that {PAGE_SIZE} is the base unit that both
>> mmap() and munmap() is done.  It carries no notion of variable
>> page sizes and how hugepages should be handled with respect to
>> pages of {PAGE_SIZE} length.  So I think this is beyond the scope
>> of the spec: any length is aligned to PAGE_SIZE, but the munmap()
>> behavior for hugetlb vmas is not restricted.
>> 
>> It would seem too dangerous at this point to change the behavior
>> of munmap(ptr, 4KB) on a hugetlb vma and that userspace bugs
>> could actually arise from aligning to the hugepage size.
> 
> You mean, there is an harder failure than the current failure? :)
> 
> 
>> Some applications purposefully reserve hugetlb pages by mmap()
>> and never munmap() them so they have exclusive access to
>> hugepages that were allocated either at boot or runtime by the
>> sysadmin.  If they depend on the return value of munmap() to
>> determine if memory to free is memory dynamically allocated by
>> the application or reserved as hugetlb memory, then this would
>> cause them to break.  I can't say for certain that no such 
>> application exists.
> 
> The fact that certain applications will seldomly call an API,
> should be no reason for API to have bugs, or at the very least, a
> bahviour which not only in not documented in the man pages, but
> also totally unrespectful of the normal mmap/munmap semantics. 
> Again, the scenario that you are picturing, is one where an
> application relies on a permanent (that is what it is - it always
> fails unless the munmap size is multiple than huge page size)
> failure of munmap, to do some productive task. An munmap() of huge
> page aligned size, will succeed in both case (vanilla, and patch).
> 
> 
>> Since hugetlb memory is beyond the scope of the POSIX.1-2001
>> munmap() specification, and there's a potential userspace
>> breakage if the length becomes hugepage aligned, I think the
>> do_unmap() implementation is correct as it stands.
> 
> If the length is huge page aligned, it will be working with or
> without patch applied. The problem is for the other 2097151 out of
> 2097152 cases, where length is not indeed aligned to 2MB (or
> whatever hugepage size is for the architecture).
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQIcBAEBAgAGBQJVFBL4AAoJELbVsDOpoOa9KoIQAIi/gAmfaQMvv9GEskFMM8Su
x3RKDf7ldHfDYbrwqKBllp/Hp3hrF0IZM5PFvq7edOEAYbgSkhIQAu0QdI2RdVlU
iKCjk4/RI2sAtNjHXZ7XrPSipBhWmbWc83cCTo38ZCbaWRwsiZ0p5t7U/K1I35tb
IKJPu8vTU3d1VupNp2Fse7VD/ImwO2HWMPXCkTT8KUcyVVKLWGQ37cSsN4jPbkQP
yqHhpzGX9P8Qkh0Hs6BFl6kjRheIKIhTD4o8Yq5kJGgXH6O8r09riMIquhogCru+
TFFVW97zjgyjRqSYE3GHu2oCdc4LLuAoRxxDMzaRqcw1C3nqKmtjB3wJEf/Pcina
UcbTlqdDjDAHy6adNb06k2q2WNNn3CdoAQIRs/mjSvdDMN+gh7TDWKMXqtv4AODc
3xedLGL2bidHCzmgrxDU1hkRjMR8DoW2MCayQoOSIpOwVhXeA2koNbgHpJD3Gboh
0y9FvLNoXMQkBi8eksatfT/kT4xn25F7OMwdP5u0euGidXsOSLz4p/87bBJpCzmr
CptQ/V+T7HgMglby8fLfZK9GE5CuwskMzrevF2cUAhyVIkXoiItD6FAh9v6SOjbh
jy4Ctq2xkCbAKhsmrUnoUOVRNlnJ8m0I9Eq1gyWHy6qU3UDmY6XBXbJEPERRbn2T
MZfuVLJLSR4Nrm7fdHoL
=C3GM
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]