Re: [PATCH 1/3] fs: Perform writebacks under memalloc_nofs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 27, 2018 at 10:13:53AM -0500, Goldwyn Rodrigues wrote:
> On 03/27/2018 09:21 AM, Matthew Wilcox wrote:
> > On Tue, Mar 27, 2018 at 07:52:48AM -0500, Goldwyn Rodrigues wrote:
> >> I am not sure if I missed a condition in the code, but here is one of
> >> the call lineup:
> >>
> >> writepages() -> writepage() -> kmalloc() -> __alloc_pages() ->
> >> __alloc_pages_nodemask -> __alloc_pages_slowpath ->
> >> __alloc_pages_direct_reclaim() -> try_to_free_pages() ->
> >> do_try_to_free_pages() -> shrink_zones() -> shrink_node() ->
> >> shrink_slab() -> do_shrink_slab() -> shrinker.scan_objects() ->
> >> super_cache_scan() -> prune_icache_sb() -> fs/inode.c:dispose_list() ->
> >> evict(inode) -> evict_inode() for ext4 ->  filemap_write_and_wait() ->
> >> filemap_fdatawrite(mapping) -> __filemap_fdatawrite_range() ->
> >> do_writepages -> writepages()
> >>
> >> Please note, most filesystems currently have a safeguard in writepage()
> >> which will return if the PF_MEMALLOC is set. The other safeguard is
> >> __GFP_FS which we are trying to eliminate.
> > 
> > But is that harmful?  ext4_writepage() (for example) says that it will
> > not deadlock in that circumstance:
> 
> No, it is not harmful.
> 
> > 
> >  * We can get recursively called as show below.
> >  *
> >  *      ext4_writepage() -> kmalloc() -> __alloc_pages() -> page_launder() ->
> >  *              ext4_writepage()
> >  *
> >  * But since we don't do any block allocation we should not deadlock.
> >  * Page also have the dirty flag cleared so we don't get recurive page_lock.
> 
> Yes, and it avoids this by checking for PF_MEMALLOC flag.
> 
> > 
> > One might well argue that it's not *useful*; if we've gone into
> > writepage already, there's no point in re-entering writepage.  And the
> > last thing we want to do is 
> 
> ?

Sorry, got cut off.  The last thing we want to do is blow the stack by
recursing too deeply, but I don't think we're going to go through this
loop more than once.

> > But I could see filesystems behaving differently when entered
> > for writepage-for-regularly-scheduled-writeback versus
> > writepage-for-shrinking, so maybe they can make progress.
> > 
> 
> do_writepages() is the same for both, and hence the memalloc_* API patch.

But we don't want to avoid this particular recursion.  We only need to
avoid the recursion if it would result in a deadlock.

> > Maybe no real filesystem behaves that way.  We need feedback from
> > filesystem people.
> 
> The idea is to:
> * Keep a central location for check, rather than individual filesystem
> writepage(). It should reduce code as well.
> * Filesystem developers call memory allocations without thinking twice
> about which GFP flag to use: GFP_KERNEL or GFP_NOFS. In essence
> eliminate GFP_NOFS.

I know the goal is to eliminate GFP_NOFS.  I'm very much in favour
of that idea.  I'm just not sure you're going about it the right way.
Probably we will have a good discussion about it next month.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux