Re: [PATCH 1/3] fs: Perform writebacks under memalloc_nofs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 03/27/2018 09:21 AM, Matthew Wilcox wrote:
> On Tue, Mar 27, 2018 at 07:52:48AM -0500, Goldwyn Rodrigues wrote:
>> I am not sure if I missed a condition in the code, but here is one of
>> the call lineup:
>>
>> writepages() -> writepage() -> kmalloc() -> __alloc_pages() ->
>> __alloc_pages_nodemask -> __alloc_pages_slowpath ->
>> __alloc_pages_direct_reclaim() -> try_to_free_pages() ->
>> do_try_to_free_pages() -> shrink_zones() -> shrink_node() ->
>> shrink_slab() -> do_shrink_slab() -> shrinker.scan_objects() ->
>> super_cache_scan() -> prune_icache_sb() -> fs/inode.c:dispose_list() ->
>> evict(inode) -> evict_inode() for ext4 ->  filemap_write_and_wait() ->
>> filemap_fdatawrite(mapping) -> __filemap_fdatawrite_range() ->
>> do_writepages -> writepages()
>>
>> Please note, most filesystems currently have a safeguard in writepage()
>> which will return if the PF_MEMALLOC is set. The other safeguard is
>> __GFP_FS which we are trying to eliminate.
> 
> But is that harmful?  ext4_writepage() (for example) says that it will
> not deadlock in that circumstance:

No, it is not harmful.

> 
>  * We can get recursively called as show below.
>  *
>  *      ext4_writepage() -> kmalloc() -> __alloc_pages() -> page_launder() ->
>  *              ext4_writepage()
>  *
>  * But since we don't do any block allocation we should not deadlock.
>  * Page also have the dirty flag cleared so we don't get recurive page_lock.

Yes, and it avoids this by checking for PF_MEMALLOC flag.

> 
> One might well argue that it's not *useful*; if we've gone into
> writepage already, there's no point in re-entering writepage.  And the
> last thing we want to do is 

?

> But I could see filesystems behaving differently when entered
> for writepage-for-regularly-scheduled-writeback versus
> writepage-for-shrinking, so maybe they can make progress.
> 

do_writepages() is the same for both, and hence the memalloc_* API patch.

> Maybe no real filesystem behaves that way.  We need feedback from
> filesystem people.

The idea is to:
* Keep a central location for check, rather than individual filesystem
writepage(). It should reduce code as well.
* Filesystem developers call memory allocations without thinking twice
about which GFP flag to use: GFP_KERNEL or GFP_NOFS. In essence
eliminate GFP_NOFS.


-- 
Goldwyn



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux