Re: [PATCH] mm,oom: Re-enable OOM killer using timers.

David Rientjes <rientjes@xxxxxxxxxx> · Tue, 26 Jan 2016 15:44:39 -0800 (PST)

On Fri, 22 Jan 2016, Tetsuo Handa wrote:

> > >   (1) Design and use a system with appropriate memory capacity in mind.
> > > 
> > >   (2) When (1) failed, the OOM killer is invoked. The OOM killer selects
> > >       an OOM victim and allow that victim access to memory reserves by
> > >       setting TIF_MEMDIE to it.
> > > 
> > >   (3) When (2) did not solve the OOM condition, start allowing all tasks
> > >       access to memory reserves by your approach.
> > > 
> > >   (4) When (3) did not solve the OOM condition, start selecting more OOM
> > >       victims by my approach.
> > > 
> > >   (5) When (4) did not solve the OOM condition, trigger the kernel panic.
> > > 
> > 
> > This was all mentioned previously, and I suggested that the panic only 
> > occur when memory reserves have been depleted, otherwise there is still 
> > the potential for the livelock to be solved.  That is a patch that would 
> > apply today, before any of this work, since we never want to loop 
> > endlessly in the page allocator when memory reserves are fully depleted.
> > 
> > This is all really quite simple.
> > 
> 
> So, David is OK with above approach, right?
> Then, Michal and Johannes, are you OK with above approach?
> 

The first step before implementing access to memory reserves on livelock 
(my patch) and oom killing additional processes on livelock (your patch) 
is to detect the appropriate place to panic() when reserves are depleted.

This has historically been done in the oom killer when there are no oom 
killable processes left.  That's easy to figure out and should still be 
done, but we are now introducing the possibility of memory reserves being 
fully depleted while there are oom killable processes left or victims that
cannot exit.

So we need a patch to the page allocator that would be applicable today 
before any of the above is worked on to detect when reserves are depleted 
and panic() rather than loop forever in the page allocator.  I'd suggest 
that this work be done as a follow-up to Michal's patchset to rework the 
page allocator retry logic.

It's not entirely trivial because we want to detect situations when 
high-order < PAGE_ALLOC_COSTLY_ORDER allocations are looping forever and 
we are failing due to fragmentation as well.  If all cpus are looping 
trying to allocate a task_struct, and there are eligible zones with some 
free memory but it is not allocatable, we still want to panic().

> What I'm not sure about above approach are handling of !__GFP_NOFAIL &&
> !__GFP_FS allocation requests and use of ALLOC_NO_WATERMARKS without
> TIF_MEMDIE.
> 
> Basically, we want to make small allocation requests success unless
> __GFP_NORETRY is given. Currently such allocation requests do not fail
> unless TIF_MEMDIE is given by the OOM killer. But how hard do we want to
> continue looping when we reach (3) by timeout for waiting for TIF_MEMDIE
> task at (2) expires?
> 

In my patch, that is tunable by the user with a new sysctl and defines 
when the oom killer is considered livelocked because the victim cannot 
exit.  I think we'd do *did_some_progress = 1 for !__GFP_FS as is done 
today before this expiration happens and otherwise trigger the oom killer 
livelock detection in my patch to allow the allocation to succeed with 
ALLOC_NO_WATERMARKS.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>