Re: Removing GFP_NOFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 03/08/2018 10:06 PM, Dave Chinner wrote:
> On Fri, Mar 09, 2018 at 12:35:35PM +1100, Dave Chinner wrote:
>> On Thu, Mar 08, 2018 at 03:46:18PM -0800, Matthew Wilcox wrote:
>>>
>>> Do we have a strategy for eliminating GFP_NOFS?
>>>
>>> As I understand it, our intent is to mark the areas in individual
>>> filesystems that can't be reentered with memalloc_nofs_save()/restore()
>>> pairs.  Once they're all done, then we can replace all the GFP_NOFS
>>> users with GFP_KERNEL.
>>
>> Won't be that easy, I think.  We recently came across user-reported
>> allocation deadlocks in XFS where we were doing allocation with
>> pages held in the writeback state that lockdep has never triggered
>> on.
>>
>> https://www.spinics.net/lists/linux-xfs/msg16154.html
>>
>> IOWs, GFP_NOFS isn't a solid guide to where
>> memalloc_nofs_save/restore need to cover in the filesystems because
>> there's a surprising amount of code that isn't covered by existing
>> lockdep annotations to warning us about un-intended recursion
>> problems.
>>
>> I think we need to start with some documentation of all the generic
>> rules for where these will need to be set, then the per-filesystem
>> rules can be added on top of that...
> 
> So thinking a bit further here:
> 
> * page writeback state gets set and held:
> 	->writepage should be under memalloc_nofs_save
> 	->writepages should be under memalloc_nofs_save
> * page cache write path is often under AOP_FLAG_NOFS
> 	- should probably be under memalloc_nofs_save
> * metadata writeback that uses page cache and page writeback flags
>   should probably be under memalloc_nofs_save
> 
> What other generic code paths are susceptible to allocation
> deadlocks?
> 

AFAIU, these are callbacks into the filesystem from the mm code which
are executed in case of low memory. So, the calls of memory allocation
from filesystem code are the ones that should be the one under
memalloc_nofs_save() in order to save from recursion.

OTOH (contradicting myself here), writepages, in essence writebacks, are
performed by per-BDI flusher threads which are kicked by the mm code in
low memory situations, as opposed to the thread performing the allocation.

As Tetsuo pointed out, direct reclaims are the real problematic scenarios.

Also the shrinkers registered by filesystem code. However, there are no
shrinkers that I know of, which allocate memory or perform locking.
Thanks to smartly swapping into a temporary local list variable.


-- 
Goldwyn



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux