Re: [PATCH 4/9] mm: introduce memalloc_nofs_{save,restore} API

Michal Hocko <mhocko@xxxxxxxxxx> · Sun, 18 Dec 2016 17:21:59 +0100

On Sat 17-12-16 19:44:22, Tetsuo Handa wrote:
> On 2016/12/15 23:07, Michal Hocko wrote:
> > GFP_NOFS context is used for the following 5 reasons currently
> > 	- to prevent from deadlocks when the lock held by the allocation
> > 	  context would be needed during the memory reclaim
> > 	- to prevent from stack overflows during the reclaim because
> > 	  the allocation is performed from a deep context already
> > 	- to prevent lockups when the allocation context depends on
> > 	  other reclaimers to make a forward progress indirectly
> > 	- just in case because this would be safe from the fs POV
> > 	- silence lockdep false positives
> > 
> > Unfortunately overuse of this allocation context brings some problems
> > to the MM. Memory reclaim is much weaker (especially during heavy FS
> > metadata workloads), OOM killer cannot be invoked because the MM layer
> > doesn't have enough information about how much memory is freeable by the
> > FS layer.
> 
> This series is intended for simply applying "& ~__GFP_FS" mask to allocations
> which are using GFP_KERNEL by error for the current thread, isn't it?

Not really. I've tried to cover that in changelogs but in short I would
like to achieve a state where this api would cover all the recursion
dangerous places with a documentation why and most/all the specific
allocations will not care about NOFS at all. They will simply inherit
NOFS scope when necessary.

> > In many cases it is far from clear why the weaker context is even used
> > and so it might be used unnecessarily. We would like to get rid of
> > those as much as possible. One way to do that is to use the flag in
> > scopes rather than isolated cases. Such a scope is declared when really
> > necessary, tracked per task and all the allocation requests from within
> > the context will simply inherit the GFP_NOFS semantic.
> > 
> > Not only this is easier to understand and maintain because there are
> > much less problematic contexts than specific allocation requests, this
> > also helps code paths where FS layer interacts with other layers (e.g.
> > crypto, security modules, MM etc...) and there is no easy way to convey
> > the allocation context between the layers.
> 
> I haven't heard an answer to "a terrible thing" in
> http://lkml.kernel.org/r/20160427200530.GB22544@xxxxxxxxxxxxxx .
> 
> What is your plan for checking whether we need to propagate "& ~__GFP_FS"
> mask to other threads which current thread waits synchronously (e.g.
> wait_for_completion()) from "& ~__GFP_FS" context?

This needs a deeper inspection. First of all we have to find out whether
we have a _relevant_ code which depends on kworkers (without WQ_MEM_RECLAIM)
from the NOFS context. This is not covered in this patch series, though.
I plan to get to it later after we actually finish this step.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>