-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/26/2015 07:56 AM, Davide Libenzi wrote: > On Wed, 25 Mar 2015, David Rientjes wrote: > >> I looked at this thread at http://marc.info/?t=141392508800001 >> since I didn't have it in my mailbox, and I didn't get a chance >> to actually run your test code. >> >> In short, I think what you're saying is that >> >> ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...) munmap(ptr, >> 4KB) == EINVAL > > I am not sure you have read the email correctly: > > munmap(mmap(size, HUGETLB), size) = EFAIL > > For every size not multiple of the huge page size. Whereas: > > munmap(mmap(size, HUGETLB), ALIGN(size, HUGEPAGE_SIZE)) = OK I think Davide is right here, this is a long existing bug in the MAP_HUGETLB implementation. Specifically, the mmap man page says: All pages containing a part of the indicated range are unmapped, and subsequent references to these pages will generate SIGSEGV. I realize that huge pages may not have been considered by those that wrote the spec. But if I read this I would assume that all pages, regardless of size, touched by the munmap() request should be unmapped. Please include Acked-by: Eric B Munson <emunson@xxxxxxxxxx> to the original patch. I would like to see the mmap man page adjusted to make note of this behavior as well. > > >> Respecting the mmap(2) POSIX specification? I don't think >> mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...) mapping 2MB violates >> POSIX.1-2001 and not only because it obviously doesn't address >> MAP_HUGETLB, but I don't think the spec says the system cannot >> map more memory than len. >> >> Using MAP_HUGETLB is really more a library function than anything >> else since you could easily implement the same behavior in a >> library. That function includes aligning len to the hugepage >> size, so doing >> >> ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...) >> >> is the equivalent to doing >> >> ptr = mmap(..., hugepage_size, ..., MAP_HUGETLB | ..., ...) >> >> and that doesn't violate any spec. But your patch doesn't change >> mmap() at all, so let's forget about that. > > That is what every mmap() implementation does, irrespectively of > any page size. And that is also what the POSIX spec states. The > size will be automatically rounded up to a multiple of the > underline physical page size. The problem is not mmap() though, in > this case. > > >> The question you pose is whether munmap(ptr, 4KB) should succeed >> for a hugetlb vma and in your patch you align this to the >> hugepage size of the vma in the same manner that munmap(ptr, 2KB) >> would be aligned to PAGE_SIZE for a non-hugetlb vma. >> >> The munmap() spec says the whole pages that include any part of >> the passed length should be unmapped. In spirit, I would agree >> with you that the page size for the vma is the hugepage size so >> that would be what would be unmapped. >> >> But that's going by a spec that doesn't address hugepages and is >> worded in a way that {PAGE_SIZE} is the base unit that both >> mmap() and munmap() is done. It carries no notion of variable >> page sizes and how hugepages should be handled with respect to >> pages of {PAGE_SIZE} length. So I think this is beyond the scope >> of the spec: any length is aligned to PAGE_SIZE, but the munmap() >> behavior for hugetlb vmas is not restricted. >> >> It would seem too dangerous at this point to change the behavior >> of munmap(ptr, 4KB) on a hugetlb vma and that userspace bugs >> could actually arise from aligning to the hugepage size. > > You mean, there is an harder failure than the current failure? :) > > >> Some applications purposefully reserve hugetlb pages by mmap() >> and never munmap() them so they have exclusive access to >> hugepages that were allocated either at boot or runtime by the >> sysadmin. If they depend on the return value of munmap() to >> determine if memory to free is memory dynamically allocated by >> the application or reserved as hugetlb memory, then this would >> cause them to break. I can't say for certain that no such >> application exists. > > The fact that certain applications will seldomly call an API, > should be no reason for API to have bugs, or at the very least, a > bahviour which not only in not documented in the man pages, but > also totally unrespectful of the normal mmap/munmap semantics. > Again, the scenario that you are picturing, is one where an > application relies on a permanent (that is what it is - it always > fails unless the munmap size is multiple than huge page size) > failure of munmap, to do some productive task. An munmap() of huge > page aligned size, will succeed in both case (vanilla, and patch). > > >> Since hugetlb memory is beyond the scope of the POSIX.1-2001 >> munmap() specification, and there's a potential userspace >> breakage if the length becomes hugepage aligned, I think the >> do_unmap() implementation is correct as it stands. > > If the length is huge page aligned, it will be working with or > without patch applied. The problem is for the other 2097151 out of > 2097152 cases, where length is not indeed aligned to 2MB (or > whatever hugepage size is for the architecture). > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAEBAgAGBQJVFBL4AAoJELbVsDOpoOa9KoIQAIi/gAmfaQMvv9GEskFMM8Su x3RKDf7ldHfDYbrwqKBllp/Hp3hrF0IZM5PFvq7edOEAYbgSkhIQAu0QdI2RdVlU iKCjk4/RI2sAtNjHXZ7XrPSipBhWmbWc83cCTo38ZCbaWRwsiZ0p5t7U/K1I35tb IKJPu8vTU3d1VupNp2Fse7VD/ImwO2HWMPXCkTT8KUcyVVKLWGQ37cSsN4jPbkQP yqHhpzGX9P8Qkh0Hs6BFl6kjRheIKIhTD4o8Yq5kJGgXH6O8r09riMIquhogCru+ TFFVW97zjgyjRqSYE3GHu2oCdc4LLuAoRxxDMzaRqcw1C3nqKmtjB3wJEf/Pcina UcbTlqdDjDAHy6adNb06k2q2WNNn3CdoAQIRs/mjSvdDMN+gh7TDWKMXqtv4AODc 3xedLGL2bidHCzmgrxDU1hkRjMR8DoW2MCayQoOSIpOwVhXeA2koNbgHpJD3Gboh 0y9FvLNoXMQkBi8eksatfT/kT4xn25F7OMwdP5u0euGidXsOSLz4p/87bBJpCzmr CptQ/V+T7HgMglby8fLfZK9GE5CuwskMzrevF2cUAhyVIkXoiItD6FAh9v6SOjbh jy4Ctq2xkCbAKhsmrUnoUOVRNlnJ8m0I9Eq1gyWHy6qU3UDmY6XBXbJEPERRbn2T MZfuVLJLSR4Nrm7fdHoL =C3GM -----END PGP SIGNATURE----- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>