On Tue, 15 Mar 2011, Oleg Nesterov wrote: > What I can't understand is what exactly the first patch tries to fix. > When I ask you, you tell me that for_each_process() can miss the group > leader because it can exit before sub-threads. This must not happen, > or we have some serious bug triggered by your workload. > > So, once again. Could you please explain the original problem and how > this patch helps? > [trimming cc list with a less worrysome subject line] A process in a cpuset by itself (or with other processes that are OOM_DISABLE) runs out of memory while handling page faults. It is selected as the last possible target by the oom killer and gets killed. All of its children are reparented to init (yet they have the same cpuset restrictions as the parent and are oom as well) and call do_exit(). do_exit() happens to require memory while handling proc_exit_connector() and trigger an oom itself. There are no eligible threads left to be found in the for_each_process() loop which results in a panic. The remaining children of the oom killed process spin in the page allocator because they cannot acquire the zone locks necessary for calling the oom killer themselves -- this isn't really important since they would panic the machine as well if they do call out_of_memory(). Instead, we want do_each_thread() to identify these threads that are eligible for oom kill because they have the same intersecting set of allowed nodes (regardless of whether they are reparented to init or not) and give them access to memory reserves so that they may finish allocating slab for proc_exit_connector() and exit. Anything else will unnecessary panic the machine and that's why oom-prevent-unnecessary-oom-kills-or-kernel-panics.patch fixes the issue. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>