Re: [BUGFIX][PATCH] memcg: fix oom killer kills a task in other cgroup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 6 Feb 2010 01:30:49 +0900
Minchan Kim <minchan.kim@xxxxxxxxx> wrote:

> Hi, Kame.
> 
> On Fri, Feb 5, 2010 at 9:39 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> > Please take this patch in different context with recent discussion.
> > This is a quick-fix for a terrible bug.
> >
> > This patch itself is against mmotm but can be easily applied to mainline or
> > stable tree, I think. (But I don't CC stable tree until I get ack.)
> >
> > ==
> > Now, oom-killer kills process's chidlren at first. But this means
> > a child in other cgroup can be killed. But it's not checked now.
> >
> > This patch fixes that.
> >
> > CC: Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx>
> > CC: Daisuke Nishimura <nishimura@xxxxxxxxxxxxxxxxx>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> > ---
> >  mm/oom_kill.c |    3 +++
> >  1 file changed, 3 insertions(+)
> >
> > Index: mmotm-2.6.33-Feb03/mm/oom_kill.c
> > ===================================================================
> > --- mmotm-2.6.33-Feb03.orig/mm/oom_kill.c
> > +++ mmotm-2.6.33-Feb03/mm/oom_kill.c
> > @@ -459,6 +459,9 @@ static int oom_kill_process(struct task_
> >        list_for_each_entry(c, &p->children, sibling) {
> >                if (c->mm == p->mm)
> >                        continue;
> > +               /* Children may be in other cgroup */
> > +               if (mem && !task_in_mem_cgroup(c, mem))
> > +                       continue;
> >                if (!oom_kill_task(c))
> >                        return 0;
> >        }
> >
> > --
> 
> I am worried about latency of OOM at worst case.
> I mean that task_in_mem_cgroup calls task_lock of child.
> We have used task_lock in many place.
> Some place task_lock hold and then other locks.
> For example, exit_fs held task_lock and try to hold write_lock of fs->lock.
> If child already hold task_lock and wait to write_lock of fs->lock, OOM latency
> is dependent of fs->lock.
> 
> I am not sure how many usecase is also dependent of other locks.
> If it is not as is, we can't make sure in future.
> 
> So How about try_task_in_mem_cgroup?
> If we can't hold task_lock, let's continue next child.
> 
It's recommended not to use trylock in unclear case.

Then, I think possible replacement will be not-to-use any lock in
task_in_mem_cgroup. In my short consideration, I don't think task_lock
is necessary if we can add some tricks and memory barrier.

Please let this patch to go as it is because this is an obvious bug fix
and give me time.

Now, I think of following.
This makes use of the fact mm->owner is changed only at _exit() of the owner.
If there is a race with _exit() and mm->owner is racy, the oom selection
itself was racy and bad.
==
int task_in_mem_cgroup_oom(struct task_struct *tsk, struct mem_cgroup *mem)
{
	struct mm_struct *mm;
	struct task_struct *tsk;
	int ret = 0;

	mm = tsk->mm;
	if (!mm)
		return ret;
	/*
	 * we are not interested in tasks other than owner. mm->owner is
	 * updated when the owner task exits. If the owner is exiting now
	 * (and race with us), we may miss.
	 */
	if (rcu_dereference(mm->owner) != tsk)
		return ret;
	rcu_read_lock();
	/* while this task is alive, this task is the owner */
	if (mem == mem_cgroup_from_task(tsk))
		ret = 1;
	rcu_read_unlock();
	return ret;
}
==
Hmm, it seems no memory barrier is necessary.

Does anyone has another idea ?

Thanks,
-Kame







--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]