On 10/21/22 19:28, Rik van Riel wrote: > A common use case for hugetlbfs is for the application to create > memory pools backed by huge pages, which then get handed over to > some malloc library (eg. jemalloc) for further management. > > That malloc library may be doing MADV_DONTNEED calls on memory > that is no longer needed, expecting those calls to happen on > PAGE_SIZE boundaries. > > However, currently the MADV_DONTNEED code rounds up any such > requests to HPAGE_PMD_SIZE boundaries. This leads to undesired > outcomes when jemalloc expects a 4kB MADV_DONTNEED, but 2MB of > memory get zeroed out, instead. > > Use of pre-built shared libraries means that user code does not > always know the page size of every memory arena in use. > > Avoid unexpected data loss with MADV_DONTNEED by rounding up > only to PAGE_SIZE (in do_madvise), and rounding down to huge > page granularity. > > That way programs will only get as much memory zeroed out as > they requested. > > Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > Cc: David Hildenbrand <david@xxxxxxxxxx> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: stable@xxxxxxxxxx > Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings") I do hate changing behavior, but in hindsight this is the right behavior. Especially, since it can cause unexpected data loss. Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> -- Mike Kravetz > --- > v2: split out the most urgent fix for stable backports > > mm/madvise.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index 2baa93ca2310..c7105ec6d08c 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -813,7 +813,14 @@ static bool madvise_dontneed_free_valid_vma(struct vm_area_struct *vma, > if (start & ~huge_page_mask(hstate_vma(vma))) > return false; > > - *end = ALIGN(*end, huge_page_size(hstate_vma(vma))); > + /* > + * Madvise callers expect the length to be rounded up to PAGE_SIZE > + * boundaries, and may be unaware that this VMA uses huge pages. > + * Avoid unexpected data loss by rounding down the number of > + * huge pages freed. > + */ > + *end = ALIGN_DOWN(*end, huge_page_size(hstate_vma(vma))); > + > return true; > } > > @@ -828,6 +835,9 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, > if (!madvise_dontneed_free_valid_vma(vma, start, &end, behavior)) > return -EINVAL; > > + if (start == end) > + return 0; > + > if (!userfaultfd_remove(vma, start, end)) { > *prev = NULL; /* mmap_lock has been dropped, prev is stale */ > > -- > 2.37.2 > >