Thanks a lot Kosaki for doing this! I still can't find the time to play with this code :/ On 05/31, KOSAKI Motohiro wrote: > > select_bad_process() checks PF_EXITING to detect the task which is going > to release its memory, but the logic is very wrong. > > - a single process P with the dead group leader disables > select_bad_process() completely, it will always return > ERR_PTR() while P can live forever > > - if the PF_EXITING task has already released its ->mm > it doesn't make sense to expect it is goiing to free > more memory (except task_struct/etc) > > Change the code to ignore the PF_EXITING tasks without ->mm. > > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -287,7 +287,7 @@ static struct task_struct *select_bad_process(unsigned long *ppoints, > * the process of exiting and releasing its resources. > * Otherwise we could get an easy OOM deadlock. > */ > - if (p->flags & PF_EXITING) { > + if ((p->flags & PF_EXITING) && p->mm) { (strictly speaking, this change is needed after 3/5 which removes the top-level "if (!p->mm)" check in select_bad_process). I'd like to add a note... with or without this, we have problems with the coredump. A thread participating in the coredumping (group-leader in this case) can have PF_EXITING && mm, but this doesn't mean it is going to exit soon, and the dumper can use a lot more memory. Otoh, if select_bad_process() chooses the thread which dumps the core, SIGKILL can't stop it. This should be fixed in do_coredump() paths, this is the long-standing problem. And, as it was already discussed, we only check the group-leader here. But I can't suggest something better. Oleg. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>