Re: [PATCH] mm/hugetlb: revert use of page_cache_next_miss()

Mike Kravetz <mike.kravetz@xxxxxxxxxx> · Fri, 5 May 2023 14:58:06 -0700

On 05/05/23 11:53, Sidhartha Kumar wrote:
> As reported by Ackerley[1], the use of page_cache_next_miss() in
> hugetlbfs_fallocate() introduces a bug where a second fallocate() call to
> same offset fails with -EEXIST. Revert this change and go back to the
> previous method of using get from the page cache and then dropping the
> reference on success.
> 
> hugetlbfs_pagecache_present() was also refactored to use
> page_cache_next_miss(), revert the usage there as well.
> 
> User visible impacts include hugetlb fallocate incorrectly returning
> EEXIST if pages are already present in the file. In addition, hugetlb
> pages will not be included in core dumps if they need to be brought in via
> GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages
> already present in the cache. It may try to allocate a new page and
> potentially return ENOMEM as opposed to EEXIST.
> 
> Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")

Small nit and a question for people more familiar with stable backports.

d0ce0e47b323 added the usage of page_cache_next_miss to hugetlb fallocate.
91a2fb956ad99 added the usage to hugetlbfs_pagecache_present.  Both are
in v6.3 and d0ce0e47b323 (referenced here) comes later.  So, I 'think' it
is OK to fix both instances with a single patch and reference the commit
where both are present.  Or, should there be two patches which is more
technically correct?

> Cc: <stable@xxxxxxxxxxxxxxx> #v6.3+
> Reported-by: Ackerley Tng <ackerleytng@xxxxxxxxxx>
> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@xxxxxxxxxx>
> 
> [1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@xxxxxxxxxx/
> ---
> This patch is meant to fix stable v6.3.1 as safe as possible by doing a
> simple revert.
> 
> Patch page cache: fix page_cache_next/prev_miss off by one by Mike is a
> potential fix that will allow the use of page_cache_next_miss() and is
> awaiting review.
> 
> Patch Fix fallocate error in hugetlbfs when fallocating again by Ackerley
> is another fix but introduces a new function and is also awaiting review.
> 
>  fs/hugetlbfs/inode.c |  8 +++-----
>  mm/hugetlb.c         | 11 +++++------
>  2 files changed, 8 insertions(+), 11 deletions(-)

IMO, this is safest and simplest way of fixing v6.3.  My proposed changes to
page_cache_next/prev_miss have the potential to impact readahead, so really
need review/testing by someone more familiar with that.  If a fix is
urgently needed, I would suggest using this for backport and then either
use my patch or expand Ackerley's proposal to move forward.

As a backport to stable,
Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
-- 
Mike Kravetz

> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 9062da6da5675..6d6cd8f26d76d 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -821,7 +821,6 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
>  		 */
>  		struct folio *folio;
>  		unsigned long addr;
> -		bool present;
>  
>  		cond_resched();
>  
> @@ -845,10 +844,9 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
>  		mutex_lock(&hugetlb_fault_mutex_table[hash]);
>  
>  		/* See if already present in mapping to avoid alloc/free */
> -		rcu_read_lock();
> -		present = page_cache_next_miss(mapping, index, 1) != index;
> -		rcu_read_unlock();
> -		if (present) {
> +		folio = filemap_get_folio(mapping, index);
> +		if (folio) {
> +			folio_put(folio);
>  			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
>  			hugetlb_drop_vma_policy(&pseudo_vma);
>  			continue;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 245038a9fe4ea..29ab27d2a3ef5 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5666,13 +5666,12 @@ static bool hugetlbfs_pagecache_present(struct hstate *h,
>  {
>  	struct address_space *mapping = vma->vm_file->f_mapping;
>  	pgoff_t idx = vma_hugecache_offset(h, vma, address);
> -	bool present;
> -
> -	rcu_read_lock();
> -	present = page_cache_next_miss(mapping, idx, 1) != idx;
> -	rcu_read_unlock();
> +	struct folio *folio;
>  
> -	return present;
> +	folio = filemap_get_folio(mapping, idx);
> +	if (folio)
> +		folio_put(folio);
> +	return folio != NULL;
>  }
>  
>  int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping,
> -- 
> 2.40.0
>