Re: [PATCH] mm,oom: Clarify reason to kill other threads sharing thevitctim's memory.

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Fri, 15 Apr 2016 00:03:31 +0900

Michal Hocko wrote:
> On Thu 14-04-16 19:56:31, Tetsuo Handa wrote:
> > Current comment for "Kill all user processes sharing victim->mm in other
> > thread groups" is not clear that doing so is a best effort avoidance.
> > 
> > I tried to update that logic along with TIF_MEMDIE for several times
> > but not yet accepted. Therefore, this patch changes only comment so that
> > we can apply now.
> > 
> > Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
> > ---
> >  mm/oom_kill.c | 29 ++++++++++++++++++++++-------
> >  1 file changed, 22 insertions(+), 7 deletions(-)
> > 
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index e78818d..43d0002 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -814,13 +814,28 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
> >  	task_unlock(victim);
> >  
> >  	/*
> > -	 * Kill all user processes sharing victim->mm in other thread groups, if
> > -	 * any.  They don't get access to memory reserves, though, to avoid
> > -	 * depletion of all memory.  This prevents mm->mmap_sem livelock when an
>            ^^^^^^^^^
> this was an useful information which you have dropped. Why?
> 

Because I don't think setting TIF_MEMDIE to all threads sharing the victim's
memory at oom_kill_process() increases the risk of depleting the memory
reserves, for TIF_MEMDIE helps only if that thread is doing memory allocation.
I explained it at
http://lkml.kernel.org/r/201603162016.EBJ05275.VHMFSOLJOFQtOF@xxxxxxxxxxxxxxxxxxx .

> > -	 * oom killed thread cannot exit because it requires the semaphore and
> > -	 * its contended by another thread trying to allocate memory itself.
> > -	 * That thread will now get access to memory reserves since it has a
> > -	 * pending fatal signal.
> > +	 * Kill all user processes sharing victim->mm in other thread groups,
> > +	 * if any. This reduces possibility of hitting mm->mmap_sem livelock
> > +	 * when an OOM victim thread cannot exit because it requires the
> > +	 * mm->mmap_sem for read at exit_mm() while another thread is trying
> > +	 * to allocate memory with that mm->mmap_sem held for write.
> > +	 *
> > +	 * Any thread except the victim thread itself which is killed by
> > +	 * this heuristic does not get access to memory reserves as of now,
> > +	 * but it will get access to memory reserves by calling out_of_memory()
> > +	 * or mem_cgroup_out_of_memory() since it has a pending fatal signal.
> > +	 *
> > +	 * Note that this heuristic is not perfect because it is possible that
> > +	 * a thread which shares victim->mm and is doing memory allocation with
> > +	 * victim->mm->mmap_sem held for write is marked as OOM_SCORE_ADJ_MIN.
> 
> Is this really helpful? I would rather be explicit that we _do not care_
> about these configurations. It is just PITA maintain and it doesn't make
> any sense. So rather than trying to document all the weird thing that
> might happen I would welcome a warning "mm shared with OOM_SCORE_ADJ_MIN
> task. Something is broken in your configuration!"

Would you please stop rejecting configurations which do not match your values?
The OOM killer provides a safety net against accidental memory usage.
A properly configured system should not call out_of_memory() from the beginning.
Systems you call properly configured should use panic_on_oom > 0.
What I'm asking for is a workaround for rescuing current users from unexplained
silent hangups.

> 
> > +	 * Also, it is possible that a thread which shares victim->mm and is
> > +	 * doing memory allocation with victim->mm->mmap_sem held for write
> > +	 * (possibly the victim thread itself which got TIF_MEMDIE) is blocked
> > +	 * at unkillable locks from direct reclaim paths because nothing
> > +	 * prevents TIF_MEMDIE threads which already started direct reclaim
> > +	 * paths from being blocked at unkillable locks. In such cases, the
> > +	 * OOM reaper will be unable to reap victim->mm and we will need to
> > +	 * select a different OOM victim.
> 
> This is a more general problem and not related to this particular code.
> Whenever we select a victim and call mark_oom_victim we hope it will
> eventually get out of its kernel code path (unless it was running in the
> userspace) so I am not sure this is placed properly.

To be able to act as a safety net, we should not ignore corner cases.
Please explain your approach for handling the slowpath.

> 
> >  	 */
> >  	rcu_read_lock();
> >  	for_each_process(p) {
> > -- 
> > 1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>