On Tue, Feb 9, 2010 at 9:32 AM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote: > On Sat, 6 Feb 2010 01:30:49 +0900 > Minchan Kim <minchan.kim@xxxxxxxxx> wrote: > >> Hi, Kame. >> >> On Fri, Feb 5, 2010 at 9:39 AM, KAMEZAWA Hiroyuki >> <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote: >> > Please take this patch in different context with recent discussion. >> > This is a quick-fix for a terrible bug. >> > >> > This patch itself is against mmotm but can be easily applied to mainline or >> > stable tree, I think. (But I don't CC stable tree until I get ack.) >> > >> > == >> > Now, oom-killer kills process's chidlren at first. But this means >> > a child in other cgroup can be killed. But it's not checked now. >> > >> > This patch fixes that. >> > >> > CC: Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx> >> > CC: Daisuke Nishimura <nishimura@xxxxxxxxxxxxxxxxx> >> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> >> > --- >> > mm/oom_kill.c | 3 +++ >> > 1 file changed, 3 insertions(+) >> > >> > Index: mmotm-2.6.33-Feb03/mm/oom_kill.c >> > =================================================================== >> > --- mmotm-2.6.33-Feb03.orig/mm/oom_kill.c >> > +++ mmotm-2.6.33-Feb03/mm/oom_kill.c >> > @@ -459,6 +459,9 @@ static int oom_kill_process(struct task_ >> > list_for_each_entry(c, &p->children, sibling) { >> > if (c->mm == p->mm) >> > continue; >> > + /* Children may be in other cgroup */ >> > + if (mem && !task_in_mem_cgroup(c, mem)) >> > + continue; >> > if (!oom_kill_task(c)) >> > return 0; >> > } >> > >> > -- >> >> I am worried about latency of OOM at worst case. >> I mean that task_in_mem_cgroup calls task_lock of child. >> We have used task_lock in many place. >> Some place task_lock hold and then other locks. >> For example, exit_fs held task_lock and try to hold write_lock of fs->lock. >> If child already hold task_lock and wait to write_lock of fs->lock, OOM latency >> is dependent of fs->lock. >> >> I am not sure how many usecase is also dependent of other locks. >> If it is not as is, we can't make sure in future. >> >> So How about try_task_in_mem_cgroup? >> If we can't hold task_lock, let's continue next child. >> > It's recommended not to use trylock in unclear case. > > Then, I think possible replacement will be not-to-use any lock in > task_in_mem_cgroup. In my short consideration, I don't think task_lock > is necessary if we can add some tricks and memory barrier. > > Please let this patch to go as it is because this is an obvious bug fix > and give me time. I think it's not only a latency problem of OOM but it is also a problem of deadlock. We can't expect child's lock state in oom_kill_process. So if you can remove lock like below your suggestion, I am OKAY. > > Now, I think of following. > This makes use of the fact mm->owner is changed only at _exit() of the owner. > If there is a race with _exit() and mm->owner is racy, the oom selection > itself was racy and bad. It seems to make sense to me. > == > int task_in_mem_cgroup_oom(struct task_struct *tsk, struct mem_cgroup *mem) > { > struct mm_struct *mm; > struct task_struct *tsk; > int ret = 0; > > mm = tsk->mm; > if (!mm) > return ret; > /* > * we are not interested in tasks other than owner. mm->owner is > * updated when the owner task exits. If the owner is exiting now > * (and race with us), we may miss. > */ > if (rcu_dereference(mm->owner) != tsk) > return ret; Yes. In this case, OOM killer can wait a few seconds until this task is exited. If we don't do that, we could kill other innocent task. > rcu_read_lock(); > /* while this task is alive, this task is the owner */ > if (mem == mem_cgroup_from_task(tsk)) > ret = 1; > rcu_read_unlock(); > return ret; > } > == > Hmm, it seems no memory barrier is necessary. > > Does anyone has another idea ? > > Thanks, > -Kame > > > > > > > > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href