(Sorry for the late comment) On Tue, Feb 01, 2022 at 05:40:32PM -0800, Mike Kravetz wrote: > MADV_DONTNEED is currently disabled for hugetlb mappings. This > certainly makes sense in shared file mappings as the pagecache maintains > a reference to the page and it will never be freed. However, it could > be useful to unmap and free pages in private mappings. > > The only thing preventing MADV_DONTNEED from working on hugetlb mappings > is a check in can_madv_lru_vma(). To allow support for hugetlb mappings > create and use a new routine madvise_dontneed_free_valid_vma() that will > allow hugetlb mappings. Also, before calling zap_page_range in the > DONTNEED case align start and size to huge page size for hugetlb vmas. > madvise only requires PAGE_SIZE alignment, but the hugetlb unmap routine > requires huge page size alignment. > > Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > --- > mm/madvise.c | 24 ++++++++++++++++++++++-- > 1 file changed, 22 insertions(+), 2 deletions(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index 5604064df464..7ae891e030a4 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -796,10 +796,30 @@ static int madvise_free_single_vma(struct vm_area_struct *vma, > static long madvise_dontneed_single_vma(struct vm_area_struct *vma, > unsigned long start, unsigned long end) > { > + /* > + * start and size (end - start) must be huge page size aligned > + * for hugetlb vmas. > + */ > + if (is_vm_hugetlb_page(vma)) { > + struct hstate *h = hstate_vma(vma); > + > + start = ALIGN_DOWN(start, huge_page_size(h)); > + end = ALIGN(end, huge_page_size(h)); > + } > + Maybe check the alignment before userfaultfd_remove()? Otherwise there'll be a fake message generated to the tracer. > zap_page_range(vma, start, end - start); > return 0; > } > > +static bool madvise_dontneed_free_valid_vma(struct vm_area_struct *vma, > + int behavior) > +{ > + if (is_vm_hugetlb_page(vma)) > + return behavior == MADV_DONTNEED; > + else > + return can_madv_lru_vma(vma); > +} can_madv_lru_vma() will check hugetlb again which looks a bit weird. Would it look better to write it as: madvise_dontneed_free_valid_vma() { return !(vma->vm_flags & (VM_LOCKED|VM_PFNMAP)); } can_madv_lru_vma() { return madvise_dontneed_free_valid_vma() && !is_vm_hugetlb_page(vma); } ? Another use case of DONTNEED upon hugetlbfs could be uffd-minor, because afaiu this is the only api that can force strip the hugetlb mapped pgtable without losing pagecache data. Thanks, -- Peter Xu