Re: [PATCH 1/3] fs: Perform writebacks under memalloc_nofs

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Tue, 27 Mar 2018 07:21:50 -0700

On Tue, Mar 27, 2018 at 07:52:48AM -0500, Goldwyn Rodrigues wrote:
> I am not sure if I missed a condition in the code, but here is one of
> the call lineup:
> 
> writepages() -> writepage() -> kmalloc() -> __alloc_pages() ->
> __alloc_pages_nodemask -> __alloc_pages_slowpath ->
> __alloc_pages_direct_reclaim() -> try_to_free_pages() ->
> do_try_to_free_pages() -> shrink_zones() -> shrink_node() ->
> shrink_slab() -> do_shrink_slab() -> shrinker.scan_objects() ->
> super_cache_scan() -> prune_icache_sb() -> fs/inode.c:dispose_list() ->
> evict(inode) -> evict_inode() for ext4 ->  filemap_write_and_wait() ->
> filemap_fdatawrite(mapping) -> __filemap_fdatawrite_range() ->
> do_writepages -> writepages()
> 
> Please note, most filesystems currently have a safeguard in writepage()
> which will return if the PF_MEMALLOC is set. The other safeguard is
> __GFP_FS which we are trying to eliminate.

But is that harmful?  ext4_writepage() (for example) says that it will
not deadlock in that circumstance:

 * We can get recursively called as show below.
 *
 *      ext4_writepage() -> kmalloc() -> __alloc_pages() -> page_launder() ->
 *              ext4_writepage()
 *
 * But since we don't do any block allocation we should not deadlock.
 * Page also have the dirty flag cleared so we don't get recurive page_lock.

One might well argue that it's not *useful*; if we've gone into
writepage already, there's no point in re-entering writepage.  And the
last thing we want to do is 
But I could see filesystems behaving differently when entered
for writepage-for-regularly-scheduled-writeback versus
writepage-for-shrinking, so maybe they can make progress.

Maybe no real filesystem behaves that way.  We need feedback from
filesystem people.