On 2/10/22 05:09, David Hildenbrand wrote: > On 08.02.22 00:47, Mike Kravetz wrote: >> On 2/4/22 00:35, David Hildenbrand wrote: >>>> I thought this was simple. :) >>> >>> It really bugs me that it's under-specified what's supposed to happen >>> when the length is not aligned. >>> >>> BUT: in the posix world, "calling posix_madvise() shall not affect the >>> semantics of access to memory in the specified range". So we don't care >>> too much about if we align up/down, because it wouldn't affect the >>> semantics. Especially for MADV_DONTNEED/MADV_REMOVE as implemented by >>> Linux this is certainly different and the alignment handling matters. >>> >>> So I guess especially for MADV_DONTNEED/MADV_REMOVE we need a clear >>> specification what's supposed to happen if the length falls into the >>> middle of a huge page. We should document alignment handling for >>> madvise() in general I assume. >>> >>> IMHO we should have bailed out right from the start whenever something >>> is not properly aligned, but that ship has sailed. So I agree, maybe we >>> can make at least hugetlb MADV_DONTNEED obey the same (weird) rules as >>> ordinary pages. >>> >>> So b) would mean, requiring start to be hugepage aligned and aligning-up >>> the end. Still feels wrong but at least matches existing semantics. >>> >>> Hugetlb MADV_REMOVE semantics are unfortunate and we should document the >>> exception. >> >> Thank you for all your comments David! >> >> So, my plan was to make MADV_DONTNEED behave as described above: >> - EINVAL if start address not huge page size aligned >> - Align end/length up to huge page size. >> >> The code I had for this was very specific to MADV_DONTNEED. I then thought, >> why not do the same for MADV_REMOVE as well? Or even more general, add this >> check and alignment to the vma parsing code in madvise. >> >> It was then that I realized there are several madvise behaviors that take >> non-huge page size aligned addresses for hugetlb mappings today. Making >> huge page size alignment a requirement for all madvise behaviors could break >> existing code. So, it seems like it could only be added to MADV_DONTNEED as >> this functionality does not exist today. We then end up with MADV_DONTNEED >> as the only behavior requiring huge page size alignment for hugetlb mappings. >> Sigh!!! > > :/ > >> >> I am now rethinking the decision to proceed with b) as described above. >> >> With the exception of MADV_REMOVE (which we may be able to change for >> hugetlb), madvise operations operate on huge page size pages for hugetlb >> mappings. If start address is in the middle of a hugetlb page, we essentially >> align down to the beginning of the hugetlb page. If length lands in the >> middle of a hugetlb page, we essentially round up. > > Which MADV calls would be affected? Not sure I understand the question. I was saying that madvise calls which operate on hugetlb mappings today only operate on huge pages. So, this is essentially align down starting address and align up end address. For example consider the MADV_POPULATE calls you recently added. They will only fault in huge pages in a hugetlb vma. > The "bad" thing about MADV_DONTNEED and MADV_REMOVE are that they > destroy data, which is why they heavily diverge from the original posix > madvise odea. Hmmm. That may be a good argument for strict alignment. We do not want to remove (or unmap) more than the user intended. The unmap system call has such alignment requirements. Darn! I'm thinking that I should go back to requiring alignment for MADV_DONTNEED. There really is no 'right' answer. >> >> When adding MADV_REMOVE perhaps we should go with this align down start and >> align up end strategy that is used everywhere else? I really wish we could >> go back and change things, but as you know it is too late for that. > > I assume whatever we do, we should document it at least cleanly in the > man page. Unfortunately, hugetlb is a gift that keeps on giving. Making > it at least somehow consistent, even if it's "hugtlb being consistent in > its own mess", that would be preferable I guess. Yes, more than anything we need to document behavior. -- Mike Kravetz