Re: [patch] memcg: add oom killer delay

KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> · Thu, 24 Feb 2011 09:13:48 +0900

On Wed, 23 Feb 2011 15:08:50 -0800
Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Wed, 9 Feb 2011 14:19:50 -0800 (PST)
> David Rientjes <rientjes@xxxxxxxxxx> wrote:
> 
> > Completely disabling the oom killer for a memcg is problematic if
> > userspace is unable to address the condition itself, usually because it
> > is unresponsive.  This scenario creates a memcg deadlock: tasks are
> > sitting in TASK_KILLABLE waiting for the limit to be increased, a task to
> > exit or move, or the oom killer reenabled and userspace is unable to do
> > so.
> > 
> > An additional possible use case is to defer oom killing within a memcg
> > for a set period of time, probably to prevent unnecessary kills due to
> > temporary memory spikes, before allowing the kernel to handle the
> > condition.
> > 
> > This patch adds an oom killer delay so that a memcg may be configured to
> > wait at least a pre-defined number of milliseconds before calling the oom
> > killer.  If the oom condition persists for this number of milliseconds,
> > the oom killer will be called the next time the memory controller
> > attempts to charge a page (and memory.oom_control is set to 0).  This
> > allows userspace to have a short period of time to respond to the
> > condition before deferring to the kernel to kill a task.
> > 
> > Admins may set the oom killer delay using the new interface:
> > 
> > 	# echo 60000 > memory.oom_delay_millisecs
> > 
> > This will defer oom killing to the kernel only after 60 seconds has
> > elapsed by putting the task to sleep for 60 seconds.  When setting
> > memory.oom_delay_millisecs, all pending delays have their charges retried
> > and, if necessary, the new delay is then enforced.
> > 
> > The delay is cleared the first time the memcg is oom to avoid unnecessary
> > waiting when userspace is unresponsive for future oom conditions.  It may
> > be set again using the above interface to enforce a delay on the next
> > oom.
> > 
> > When a memory.oom_delay_millisecs is set for a cgroup, it is propagated
> > to all children memcg as well and is inherited when a new memcg is
> > created.
> 
> Your patch still stinks!
> 
> If userspace can't handle a disabled oom-killer then userspace
> shouldn't have disabled the oom-killer.
> 
> How do we fix this properly?
> 
> A little birdie tells me that the offending userspace oom handler is
> running in a separate memcg and is not itself running out of memory. 
> The problem is that the userspace oom handler is also taking peeks into
> processes which are in the stressed memcg and is getting stuck on
> mmap_sem in the procfs reads.  Correct?
> 

Hmm, I think memcg's oom-kill just happens under down_read(mmap_sem). 
And all tasks, which is under oom, will be in wait-queue.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>