I am not reallu sure it is a good idea to blindly apply this patch to the older stable tree. Have you checked all filemap_fault handlers? This might have been quite different in 3.18 than for the kernel I was developing this against. If the sole purpose of this backport is to make other patch (abc1be13fd11 ("mm/filemap.c: fix NULL pointer in page_cache_tree_insert()")) apply easier then I've already suggested how to handle those rejects. On Mon 23-04-18 01:37:44, Harsh Shandilya wrote: > From: Michal Hocko <mhocko@xxxxxxxx> > > Commit c20cd45eb01748f0fba77a504f956b000df4ea73 upstream. > > page_cache_read has been historically using page_cache_alloc_cold to > allocate a new page. This means that mapping_gfp_mask is used as the > base for the gfp_mask. Many filesystems are setting this mask to > GFP_NOFS to prevent from fs recursion issues. page_cache_read is called > from the vm_operations_struct::fault() context during the page fault. > This context doesn't need the reclaim protection normally. > > ceph and ocfs2 which call filemap_fault from their fault handlers seem > to be OK because they are not taking any fs lock before invoking generic > implementation. xfs which takes XFS_MMAPLOCK_SHARED is safe from the > reclaim recursion POV because this lock serializes truncate and punch > hole with the page faults and it doesn't get involved in the reclaim. > > There is simply no reason to deliberately use a weaker allocation > context when a __GFP_FS | __GFP_IO can be used. The GFP_NOFS protection > might be even harmful. There is a push to fail GFP_NOFS allocations > rather than loop within allocator indefinitely with a very limited > reclaim ability. Once we start failing those requests the OOM killer > might be triggered prematurely because the page cache allocation failure > is propagated up the page fault path and end up in > pagefault_out_of_memory. > > We cannot play with mapping_gfp_mask directly because that would be racy > wrt. parallel page faults and it might interfere with other users who > really rely on NOFS semantic from the stored gfp_mask. The mask is also > inode proper so it would even be a layering violation. What we can do > instead is to push the gfp_mask into struct vm_fault and allow fs layer > to overwrite it should the callback need to be called with a different > allocation context. > > Initialize the default to (mapping_gfp_mask | __GFP_FS | __GFP_IO) > because this should be safe from the page fault path normally. Why do > we care about mapping_gfp_mask at all then? Because this doesn't hold > only reclaim protection flags but it also might contain zone and > movability restrictions (GFP_DMA32, __GFP_MOVABLE and others) so we have > to respect those. > > Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> > Reported-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > Acked-by: Jan Kara <jack@xxxxxxxx> > Acked-by: Vlastimil Babka <vbabka@xxxxxxx> > Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > Cc: Mel Gorman <mgorman@xxxxxxx> > Cc: Dave Chinner <david@xxxxxxxxxxxxx> > Cc: Mark Fasheh <mfasheh@xxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Harsh Shandilya <harsh@xxxxxxxx> > --- > include/linux/mm.h | 4 ++++ > mm/filemap.c | 8 ++++---- > mm/memory.c | 17 +++++++++++++++++ > 3 files changed, 25 insertions(+), 4 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 5adffb0a468f..9ac4697979e8 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -203,9 +203,13 @@ extern pgprot_t protection_map[16]; > * > * pgoff should be used in favour of virtual_address, if possible. If pgoff > * is used, one may implement ->remap_pages to get nonlinear mapping support. > + * > + * MM layer fills up gfp_mask for page allocations but fault handler might > + * alter it if its implementation requires a different allocation context. > */ > struct vm_fault { > unsigned int flags; /* FAULT_FLAG_xxx flags */ > + gfp_t gfp_mask; /* gfp mask to be used for allocations */ > pgoff_t pgoff; /* Logical page offset based on vma */ > void __user *virtual_address; /* Faulting virtual address */ > > diff --git a/mm/filemap.c b/mm/filemap.c > index 7e6ab98d4d3c..aafeeefcb00d 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1746,18 +1746,18 @@ EXPORT_SYMBOL(generic_file_read_iter); > * This adds the requested page to the page cache if it isn't already there, > * and schedules an I/O to read in its contents from disk. > */ > -static int page_cache_read(struct file *file, pgoff_t offset) > +static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask) > { > struct address_space *mapping = file->f_mapping; > struct page *page; > int ret; > > do { > - page = page_cache_alloc_cold(mapping); > + page = __page_cache_alloc(gfp_mask|__GFP_COLD); > if (!page) > return -ENOMEM; > > - ret = add_to_page_cache_lru(page, mapping, offset, GFP_KERNEL); > + ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask & GFP_KERNEL); > if (ret == 0) > ret = mapping->a_ops->readpage(file, page); > else if (ret == -EEXIST) > @@ -1940,7 +1940,7 @@ no_cached_page: > * We're only likely to ever get here if MADV_RANDOM is in > * effect. > */ > - error = page_cache_read(file, offset); > + error = page_cache_read(file, offset, vmf->gfp_mask); > > /* > * The page we want has now been added to the page cache. > diff --git a/mm/memory.c b/mm/memory.c > index 0c4f5e36b155..5a62c6a42143 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1973,6 +1973,20 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo > copy_user_highpage(dst, src, va, vma); > } > > +static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma) > +{ > + struct file *vm_file = vma->vm_file; > + > + if (vm_file) > + return mapping_gfp_mask(vm_file->f_mapping) | __GFP_FS | __GFP_IO; > + > + /* > + * Special mappings (e.g. VDSO) do not have any file so fake > + * a default GFP_KERNEL for them. > + */ > + return GFP_KERNEL; > +} > + > /* > * Notify the address space that the page is about to become writable so that > * it can prohibit this or wait for the page to get into an appropriate state. > @@ -1988,6 +2002,7 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page, > vmf.virtual_address = (void __user *)(address & PAGE_MASK); > vmf.pgoff = page->index; > vmf.flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE; > + vmf.gfp_mask = __get_fault_gfp_mask(vma); > vmf.page = page; > > ret = vma->vm_ops->page_mkwrite(vma, &vmf); > @@ -2670,6 +2685,7 @@ static int __do_fault(struct vm_area_struct *vma, unsigned long address, > vmf.pgoff = pgoff; > vmf.flags = flags; > vmf.page = NULL; > + vmf.gfp_mask = __get_fault_gfp_mask(vma); > > ret = vma->vm_ops->fault(vma, &vmf); > if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY))) > @@ -2834,6 +2850,7 @@ static void do_fault_around(struct vm_area_struct *vma, unsigned long address, > vmf.pgoff = pgoff; > vmf.max_pgoff = max_pgoff; > vmf.flags = flags; > + vmf.gfp_mask = __get_fault_gfp_mask(vma); > vma->vm_ops->map_pages(vma, &vmf); > } > > -- > 2.15.0.2308.g658a28aa74af > -- Michal Hocko SUSE Labs