Re: How to handle TIF_MEMDIE stalls?

Michal Hocko <mhocko@xxxxxxx> · Tue, 17 Feb 2015 17:50:24 +0100

On Tue 17-02-15 08:16:18, Johannes Weiner wrote:
> On Tue, Feb 17, 2015 at 08:57:05PM +0900, Tetsuo Handa wrote:
> > Johannes Weiner wrote:
> > > On Mon, Feb 16, 2015 at 08:23:16PM +0900, Tetsuo Handa wrote:
> > > >   (2) Implement TIF_MEMDIE timeout.
> > > 
> > > How about something like this?  This should solve the deadlock problem
> > > in the page allocator, but it would also simplify the memcg OOM killer
> > > and allow its use by in-kernel faults again.
> > 
> > Yes, basic idea would be same with
> > http://marc.info/?l=linux-mm&m=142002495532320&w=2 .
> > 
> > But Michal and David do not like the timeout approach.
> > http://marc.info/?l=linux-mm&m=141684783713564&w=2
> > http://marc.info/?l=linux-mm&m=141686814824684&w=2

Yes I really hate time based solutions for reasons already explained in
the referenced links.

> I'm open to suggestions, but we can't just stick our heads in the sand
> and pretend that these are just unrelated bugs.  They're not. 

Requesting GFP_NOFAIL allocation with locks held is IMHO a bug and
should be fixed.
Hopelessly looping in the page allocator without GFP_NOFAIL is too risky
as well and we should get rid of this. Why should we still try to loop
when previous 1000 attempts failed with OOM killer invocation? Can we
simply fail after a configurable number of attempts? This is prone to
reveal unchecked allocation failures but those are bugs as well and we
shouldn't pretend otherwise.

> As long
> as it's legal to enter the allocator with *anything* that can prevent
> another random task in the system from making progress, we have this
> deadlock potential.  One side has to give up, and it can't be the page
> allocator because it has to support __GFP_NOFAIL allocations, which
> are usually exactly the allocations that are buried in hard-to-unwind
> state that is likely to trip up exiting OOM victims.

I am not convinced that GFP_NOFAIL is the biggest problem. Most if
OOM livelocks I have seen were either due to GFP_KERNEL treated as
GFP_NOFAIL or an incorrect gfp mask (e.g. GFP_FS added where not
appropriate). I think we should focus on this part before we start
adding heuristics into OOM killer.

> The alternative would be lock dependency tracking, but I'm not sure it
> can be realistically done for production environments.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>