Re: [PATCH for 3.2] memcg: do not trap chargers with full callstack on OOM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon 24-06-13 16:13:45, Johannes Weiner wrote:
> Hi guys,
> 
> On Sat, Jun 22, 2013 at 10:09:58PM +0200, azurIt wrote:
> > >> But i'm sure of one thing - when problem occurs, nothing is able to
> > >> access hard drives (every process which tries it is freezed until
> > >> problem is resolved or server is rebooted).
> > >
> > >I would be really interesting to see what those tasks are blocked on.
> > 
> > I'm trying to get it, stay tuned :)
> > 
> > Today i noticed one bug, not 100% sure it is related to 'your' patch
> > but i didn't seen this before. I noticed that i have lots of cgroups
> > which cannot be removed - if i do 'rmdir <cgroup_directory>', it
> > just hangs and never complete. Even more, it's not possible to
> > access the whole cgroup filesystem until i kill that rmdir
> > (anything, which tries it, just hangs). All unremoveable cgroups has
> > this in 'memory.oom_control': oom_kill_disable 0 under_oom 1
> 
> Somebody acquires the OOM wait reference to the memcg and marks it
> under oom but then does not call into mem_cgroup_oom_synchronize() to
> clean up.  That's why under_oom is set and the rmdir waits for
> outstanding references.
> 
> > And, yes, 'tasks' file is empty.
> 
> It's not a kernel thread that does it because all kernel-context
> handle_mm_fault() are annotated properly, which means the task must be
> userspace and, since tasks is empty, have exited before synchronizing.

Yes, well spotted. I have missed that while reviewing your patch.
The follow up fix looks correct.

> Can you try with the following patch on top?
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 5db0490..9a0b152 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -846,17 +846,6 @@ static noinline int
>  mm_fault_error(struct pt_regs *regs, unsigned long error_code,
>  	       unsigned long address, unsigned int fault)
>  {
> -	/*
> -	 * Pagefault was interrupted by SIGKILL. We have no reason to
> -	 * continue pagefault.
> -	 */
> -	if (fatal_signal_pending(current)) {
> -		if (!(fault & VM_FAULT_RETRY))
> -			up_read(&current->mm->mmap_sem);
> -		if (!(error_code & PF_USER))
> -			no_context(regs, error_code, address);
> -		return 1;
> -	}
>  	if (!(fault & VM_FAULT_ERROR))
>  		return 0;
>  

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]