unnecessary oom killer panics in 2.6.38 (was Re: Linux 2.6.38)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 15 Mar 2011, Oleg Nesterov wrote:

> What I can't understand is what exactly the first patch tries to fix.
> When I ask you, you tell me that for_each_process() can miss the group
> leader because it can exit before sub-threads. This must not happen,
> or we have some serious bug triggered by your workload.
> 
> So, once again. Could you please explain the original problem and how
> this patch helps?
> 

[trimming cc list with a less worrysome subject line]

A process in a cpuset by itself (or with other processes that are 
OOM_DISABLE) runs out of memory while handling page faults.  It is 
selected as the last possible target by the oom killer and gets killed.  
All of its children are reparented to init (yet they have the same 
cpuset restrictions as the parent and are oom as well) and call do_exit().  
do_exit() happens to require memory while handling proc_exit_connector() 
and trigger an oom itself.  There are no eligible threads left to be found 
in the for_each_process() loop which results in a panic.  The remaining 
children of the oom killed process spin in the page allocator because they 
cannot acquire the zone locks necessary for calling the oom killer 
themselves -- this isn't really important since they would panic the 
machine as well if they do call out_of_memory().

Instead, we want do_each_thread() to identify these threads that are 
eligible for oom kill because they have the same intersecting set of 
allowed nodes (regardless of whether they are reparented to init or not) 
and give them access to memory reserves so that they may finish allocating 
slab for proc_exit_connector() and exit.  Anything else will unnecessary 
panic the machine and that's why 
oom-prevent-unnecessary-oom-kills-or-kernel-panics.patch fixes the issue.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]