Re: [PATCH v2] mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/21/22 19:28, Rik van Riel wrote:
> A common use case for hugetlbfs is for the application to create
> memory pools backed by huge pages, which then get handed over to
> some malloc library (eg. jemalloc) for further management.
> 
> That malloc library may be doing MADV_DONTNEED calls on memory
> that is no longer needed, expecting those calls to happen on
> PAGE_SIZE boundaries.
> 
> However, currently the MADV_DONTNEED code rounds up any such
> requests to HPAGE_PMD_SIZE boundaries. This leads to undesired
> outcomes when jemalloc expects a 4kB MADV_DONTNEED, but 2MB of
> memory get zeroed out, instead.
> 
> Use of pre-built shared libraries means that user code does not
> always know the page size of every memory arena in use.
> 
> Avoid unexpected data loss with MADV_DONTNEED by rounding up
> only to PAGE_SIZE (in do_madvise), and rounding down to huge
> page granularity.
> 
> That way programs will only get as much memory zeroed out as
> they requested.
> 
> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> Cc: David Hildenbrand <david@xxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxx
> Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings")

I do hate changing behavior, but in hindsight this is the right behavior.
Especially, since it can cause unexpected data loss.

Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
-- 
Mike Kravetz

> ---
> v2: split out the most urgent fix for stable backports
> 
>  mm/madvise.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 2baa93ca2310..c7105ec6d08c 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -813,7 +813,14 @@ static bool madvise_dontneed_free_valid_vma(struct vm_area_struct *vma,
>  	if (start & ~huge_page_mask(hstate_vma(vma)))
>  		return false;
>  
> -	*end = ALIGN(*end, huge_page_size(hstate_vma(vma)));
> +	/*
> +	 * Madvise callers expect the length to be rounded up to PAGE_SIZE
> +	 * boundaries, and may be unaware that this VMA uses huge pages.
> +	 * Avoid unexpected data loss by rounding down the number of
> +	 * huge pages freed.
> +	 */
> +	*end = ALIGN_DOWN(*end, huge_page_size(hstate_vma(vma)));
> +
>  	return true;
>  }
>  
> @@ -828,6 +835,9 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
>  	if (!madvise_dontneed_free_valid_vma(vma, start, &end, behavior))
>  		return -EINVAL;
>  
> +	if (start == end)
> +		return 0;
> +
>  	if (!userfaultfd_remove(vma, start, end)) {
>  		*prev = NULL; /* mmap_lock has been dropped, prev is stale */
>  
> -- 
> 2.37.2
> 
> 




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux