Re: [PATCH] mm/oom: Suppress unnecessary "sharing same memory" message.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Michal Hocko wrote:
> On Wed 27-05-15 06:39:42, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Mon 25-05-15 23:33:31, Tetsuo Handa wrote:
> > > > >From 3728807fe66ebc24a8a28455593754b9532bbe74 Mon Sep 17 00:00:00 2001
> > > > From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
> > > > Date: Mon, 25 May 2015 22:26:07 +0900
> > > > Subject: [PATCH] mm/oom: Suppress unnecessary "sharing same memory" message.
> > > > 
> > > > If the mm struct which the OOM victim is using is shared by e.g. 1000
> > > > threads, and the lock dependency prevents all threads except the OOM
> > > > victim thread from terminating until they get TIF_MEMDIE flag, the OOM
> > > > killer will be invoked for 1000 times on this mm struct. As a result,
> > > > the kernel would emit
> > > > 
> > > >   "Kill process %d (%s) sharing same memory\n"
> > > > 
> > > > line for 1000 * 1000 / 2 times. But once these threads got pending SIGKILL,
> > > > emitting this information is nothing but noise. This patch filters them.
> > > 
> > > OK, I can see this might be really annoying. But reducing this message
> > > will not help much because it is the dump_header which generates a lot
> > > of output. And there is clearly no reason to treat the selected victim
> > > any differently than the current so why not simply do the following
> > > instead?
> > > ---
> > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > > index 5cfda39b3268..a67ce18b4b35 100644
> > > --- a/mm/oom_kill.c
> > > +++ b/mm/oom_kill.c
> > > @@ -505,7 +505,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
> > >  	 * its children or threads, just set TIF_MEMDIE so it can die quickly
> > >  	 */
> > >  	task_lock(p);
> > > -	if (p->mm && task_will_free_mem(p)) {
> > > +	if (p->mm && (fatal_signal_pending(p) || task_will_free_mem(p))) {
> > >  		mark_oom_victim(p);
> > >  		task_unlock(p);
> > >  		put_task_struct(p);
> > > 
> > 
> > I don't think this is good, for this will omit sending SIGKILL to threads
> > sharing p->mm ("Kill all user processes sharing victim->mm in other thread
> > groups, if any.")
> 
> threads? The whole thread group will die when the fatal signal is
> send to the group leader no? This mm sharing handling is about
> processes which are sharing mm but they are not in the same thread group

OK. I should say "omit sending SIGKILL to processes which are sharing mm
but they are not in the same thread group".

> (aka CLONE_VM without CLONE_SIGHAND resp. CLONE_THREAD).

clone(CLONE_SIGHAND | CLONE_VM) ?

> 
> > when p already has pending SIGKILL.
> 
> yes we can select a task which has SIGKILL already pending and then
> we wouldn't kill other processes which share the same mm but does it
> matter?  I do not think so. Because if this is really the case and the
> OOM condition continues even after p exits (which is very probable but
> p alone might release some resources and free memory) we will find a
> process with the same mm in the next round.

I think it matters because p cannot call do_exit() when p is blocked by
processes which are sharing mm but they are not in the same thread group.

> 
> > By the way, if p with p->mm && task_will_free_mem(p) can get stuck due to
> > memory allocation deadlock, is it OK that currently we are not sending SIGKILL
> > to threads sharing p->mm ?
> 
> I am not sure I understand the question. Threads will die automatically
> because we are sending group signal.

I just imagined a case where p is blocked at down_read() in acct_collect() from
do_exit() when p is sharing mm with other processes, and other process is doing
blocking operation with mm->mmap_sem held for writing. Is such case impossible?

do_exit() {
  exit_signals(tsk);  /* sets PF_EXITING */
  acct_collect(code, group_dead) {
    if (group_dead && current->mm) {
      down_read(&current->mm->mmap_sem);
      up_read(&current->mm->mmap_sem);
    }
  }
  exit_mm(tsk) {
     down_read(&mm->mmap_sem);
     tsk->mm = NULL;
     up_read(&mm->mmap_sem);
  }
}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]