On Thu, 1 Apr 2010, Oleg Nesterov wrote: > > > > > Say, oom_forkbomb_penalty() does list_for_each_entry(tsk->children). > > > > > Again, this is not right even if we forget about !child->mm check. > > > > > This list_for_each_entry() can only see the processes forked by the > > > > > main thread. > > > > > > > > > > > > > That's the intention. > > > > > > Why? shouldn't oom_badness() return the same result for any thread > > > in thread group? We should take all childs into account. > > > > > > > oom_forkbomb_penalty() only cares about first-descendant children that > > do not share the same memory, > > I see, but the code doesn't really do this. I mean, it doesn't really > see the first-descendant children, only those which were forked by the > main thread. > > Look. We have a main thread M and the sub-thread T. T forks a lot of > processes which use a lot of memory. These processes _are_ the first > descendant children of the M+T thread group, they should be accounted. > But M->children list is empty. > > oom_forkbomb_penalty() and oom_kill_process() should do > > t = tsk; > do { > list_for_each_entry(child, &t->children, sibling) { > ... take child into account ... > } > } while_each_thread(tsk, t); > > In this case, it seems more appropriate that we would penalize T and not M since it's not necessarily responsible for the behavior of the children it forks. T is the buggy/malicious program, not M. > See the patch below. Yes, this is minor, but it is always good to avoid > the unnecessary locks, and thread_group_cputime() is O(N). > > Not only for performance reasons. This allows to change the locking in > thread_group_cputime() if needed without fear to deadlock with task_lock(). > > Oleg. > > --- x/mm/oom_kill.c > +++ x/mm/oom_kill.c > @@ -97,13 +97,16 @@ static unsigned long oom_forkbomb_penalt > return 0; > list_for_each_entry(child, &tsk->children, sibling) { > struct task_cputime task_time; > - unsigned long runtime; > + unsigned long runtime, this_rss; > > task_lock(child); > if (!child->mm || child->mm == tsk->mm) { > task_unlock(child); > continue; > } > + this_rss = get_mm_rss(child->mm); > + task_unlock(child); > + > thread_group_cputime(child, &task_time); > runtime = cputime_to_jiffies(task_time.utime) + > cputime_to_jiffies(task_time.stime); > @@ -113,10 +116,9 @@ static unsigned long oom_forkbomb_penalt > * get to execute at all in such cases anyway. > */ > if (runtime < HZ) { > - child_rss += get_mm_rss(child->mm); > + child_rss += this_rss; > forkcount++; > } > - task_unlock(child); > } > > /* This patch looks good, will you send it to Andrew with a changelog and sign-off line? Also feel free to add: Acked-by: David Rientjes <rientjes@xxxxxxxxxx> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>