Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read

NeilBrown <neilb@xxxxxxx> · Thu, 19 Mar 2015 08:38:35 +1100

On Wed, 18 Mar 2015 16:45:40 +0100 Michal Hocko <mhocko@xxxxxxx> wrote:

> What do you think about this v2? I cannot say I would like it but I
> really dislike the whole mapping_gfp_mask API to be honest.
> ---
> >From d88010d6f5f59d7eb87b691e27e201d12cab9141 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@xxxxxxx>
> Date: Wed, 18 Mar 2015 16:06:40 +0100
> Subject: [PATCH] mm: Allow __GFP_FS for page_cache_read page cache allocation
> 
> page_cache_read has been historically using page_cache_alloc_cold to
> allocate a new page. This means that mapping_gfp_mask is used as the
> base for the gfp_mask. Many filesystems are setting this mask to
> GFP_NOFS to prevent from fs recursion issues. page_cache_read is,
> however, not called from the fs layer so it doesn't need this
> protection. Even ceph and ocfs2 which call filemap_fault from their
> fault handlers seem to be OK because they are not taking any fs lock
> before invoking generic implementation.
> 
> The protection might be even harmful. There is a strong push to fail
> GFP_NOFS allocations rather than loop within allocator indefinitely with
> a very limited reclaim ability. Once we start failing those requests
> the OOM killer might be triggered prematurely because the page cache
> allocation failure is propagated up the page fault path and end up in
> pagefault_out_of_memory.
> 
> Add __GFP_FS and __GFPIO to the gfp mask which is coming from the
> mapping to fix this issue.
> 
> Reported-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Michal Hocko <mhocko@xxxxxxx>
> ---
>  mm/filemap.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 968cd8e03d2e..8b50d5eb52b2 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1752,7 +1752,15 @@ static int page_cache_read(struct file *file, pgoff_t offset)
>  	int ret;
>  
>  	do {
> -		page = page_cache_alloc_cold(mapping);
> +		gfp_t page_cache_gfp = mapping_gfp_mask(mapping)|__GFP_COLD;
> +
> +		/*
> +		 * This code is not called from the fs layer so we do not need
> +		 * reclaim recursion protection. !GFP_FS might fail too easy
> +		 * and trigger OOM killer prematuraly.
> +		 */
> +		page_cache_gfp |= __GFP_FS | __GFP_IO;
> +		page = __page_cache_alloc(page_cache_gfp);
>  		if (!page)
>  			return -ENOMEM;
>  

Nearly half the places in the kernel which call mapping_gfp_mask() remove the
__GFP_FS bit.

That suggests to me that it might make sense to have
   mapping_gfp_mask_fs()
and
   mapping_gfp_mask_nofs()

and let the presence of __GFP_FS (and __GFP_IO) be determined by the
call-site rather than the filesystem.

However I am a bit concerned about drivers/block/loop.c.
Might a filesystem read on the loop block device wait for a page_cache_read()
on the loop-mounted file?  In that case you really don't want __GFP_FS set
when allocating that page.

NeilBrown
Attachment:
pgp8J222Yhp14.pgp

Description: OpenPGP digital signature