On 1/27/22 03:57, David Hildenbrand wrote: > On 13.01.22 19:03, Mike Kravetz wrote: >> Userfaultfd selftests for hugetlb does not perform UFFD_EVENT_REMAP >> testing. However, mremap support was recently added in commit >> 550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage backed >> vma"). While attempting to enable mremap support in the test, it was >> discovered that the mremap test indirectly depends on MADV_DONTNEED. >> >> hugetlb does not support MADV_DONTNEED. However, the only thing >> preventing support is a check in can_madv_lru_vma(). Simply removing >> the check will enable support. >> >> This is sent as a RFC because there is no existing use case calling >> for hugetlb MADV_DONTNEED support except possibly the userfaultfd test. >> However, adding support makes sense as it is fairly trivial and brings >> hugetlb functionality more in line with 'normal' memory. >> > > Just a note: > > QEMU doesn't use huge anonymous memory directly (MAP_ANON | MAP_HUGE...) > but instead always goes either via hugetlbfs or via memfd. > > For MAP_PRIVATE hugetlb mappings, fallocate(FALLOC_FL_PUNCH_HOLE) seems > to get the job done (IOW: also discards private anon pages). See the > comments in the QEMU code below. I remember that that is somewhat > inconsistent. For ordinary MAP_PRIVATE mapped files I remember that we > always need fallocate(FALLOC_FL_PUNCH_HOLE) + madvise(QEMU_MADV_DONTNEED) > to make sure > > a) All file pages are removed > b) All private anon pages are removed > > IIRC hugetlbfs really is different in that regard, but maybe other fs > behave similarly. Yes it is really different. And, some might even consider that a bug? Imagine if those private anon (COW) pages contain important data. They could be unmapped/freed by some other process that has write access to the hugetlb file on which the private mapping is based. I believe this same issue exists for hugetlbfs ftruncate. When fallocate hole punch support was added, it was based on the ftruncate functionality. I am hesitant to change the behavior of hugetlb hole punch or truncate as people may be relying on that behavior today. Your QEMU example is one such case. Thanks, -- Mike Kravetz