Re: [PATCH] mm/oom_kill.c: don't kill TASK_UNINTERRUPTIBLE tasks

David Rientjes <rientjes@xxxxxxxxxx> · Mon, 21 Sep 2015 16:33:31 -0700 (PDT)

On Sat, 19 Sep 2015, Tetsuo Handa wrote:

> I think that use of ALLOC_NO_WATERMARKS via TIF_MEMDIE is the underlying
> cause. ALLOC_NO_WATERMARKS via TIF_MEMDIE is intended for terminating the
> OOM victim task as soon as possible, but it turned out that it will not
> work if there is invisible lock dependency. Therefore, why not to give up
> "there should be only up to 1 TIF_MEMDIE task" rule?
> 

I don't see the connection between TIF_MEMDIE and ALLOC_NO_WATERMARKS 
being problematic.  It is simply the mechanism by which we give oom killed 
processes access to memory reserves if they need it.  I believe you are 
referring only to the oom killer stalling when it finds an oom victim.

> What this patch (and many others posted in various forms many times over
> past years) does is to give up "there should be only up to 1 TIF_MEMDIE
> task" rule. I think that we need to tolerate more than 1 TIF_MEMDIE tasks
> and somehow manage in a way memory reserves will not deplete.
> 

Your proposal, which I mostly agree with, tries to kill additional 
processes so that they allocate and drop the lock that the original victim 
depends on.  My approach, from 
http://marc.info/?l=linux-kernel&m=144010444913702, is the same, but 
without the killing.  It's unecessary to kill every process on the system 
that is depending on the same lock, and we can't know which processes are 
stalling on that lock and which are not.

I think it's much easier to simply identify such a situation where a 
process has not exited in a timely manner and then provide processes 
access to memory reserves without being killed.  We hope that the victim 
will have queued its mutex_lock() and allocators that are holding the lock 
will drop it after successfully utilizing memory reserves.

We can mitigate immediate depletion of memory reserves by requiring all 
allocators to reclaim (or compact) and calling the oom killer to identify 
the timeout before granting access to memory reserves for a single 
allocation before schedule_timeout_killable(1) and returning.

I don't know of any alternative solutions where we can guarantee that 
memory reserves cannot be depleted unless memory reserves are 100% of 
memory.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>