On Mon, 14 Mar 2011, Oleg Nesterov wrote: > oom_kill_process() starts with victim_points == 0. This means that > (most likely) any child has more points and can be killed erroneously. > > Also, "children has a different mm" doesn't match the reality, we > should check child->mm != t->mm. This check is not exactly correct > if t->mm == NULL but this doesn't really matter, oom_kill_task() > will kill them anyway. > > Note: "Kill all processes sharing p->mm" in oom_kill_task() is wrong > too. > There're two issues you're addressing in this patch. It only kills a child in place of its selected parent when: - the child has a higher badness score, and - it has a different ->mm. In the former case, NACK, we always want to sacrifice children regardless of their badness score (as long as it is non-zero) if it has a separate ->mm in place of its parent, otherwise webservers will be killed instead of one of their children serving a client, sshd could be killed instead of bash, etc. The behavior of the oom killer has always been to try to kill a child with its own ->mm first to avoid losing a large amount of work being done or unnecessarily killing a job scheduler, for example, when sacrificing a child would be satisfactory. It'll kill additional tasks, and perhaps even the parent later if it has no more children, if the oom condition persists. In the latter case, I agree, we should be testing if the child has a different ->mm before sacrificing it for its parent as the comment indicates it will. I proposed that exact change in "oom: avoid deferring oom killer if exiting task is being traced" posted to -mm a couple days ago. > Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx> > --- > > mm/oom_kill.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > --- 38/mm/oom_kill.c~3_fix_kill_chld 2011-03-14 18:52:39.000000000 +0100 > +++ 38/mm/oom_kill.c 2011-03-14 19:36:01.000000000 +0100 > @@ -459,10 +459,10 @@ static int oom_kill_process(struct task_ > struct mem_cgroup *mem, nodemask_t *nodemask, > const char *message) > { > - struct task_struct *victim = p; > + struct task_struct *victim; > struct task_struct *child; > - struct task_struct *t = p; > - unsigned int victim_points = 0; > + struct task_struct *t; > + unsigned int victim_points; > > if (printk_ratelimit()) > dump_header(p, gfp_mask, order, mem, nodemask); > @@ -488,10 +488,15 @@ static int oom_kill_process(struct task_ > * parent. This attempts to lose the minimal amount of work done while > * still freeing memory. > */ > + victim_points = oom_badness(p, mem, nodemask, totalpages); > + victim = p; > + t = p; > do { > list_for_each_entry(child, &t->children, sibling) { > unsigned int child_points; > > + if (child->mm == t->mm) > + continue; > /* > * oom_badness() returns 0 if the thread is unkillable > */ > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>