Re: [patch] memcg: add oom killer delay

KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> · Wed, 22 Dec 2010 17:17:49 +0900

On Tue, 21 Dec 2010 23:59:24 -0800
Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Tue, 21 Dec 2010 23:27:25 -0800 (PST) David Rientjes <rientjes@xxxxxxxxxx> wrote:
> 
> > Completely disabling the oom killer for a memcg is problematic if
> > userspace is unable to address the condition itself, usually because
> > userspace is unresponsive.  This scenario creates a memcg livelock:
> > tasks are continuously trying to allocate memory and nothing is getting
> > killed, so memory freeing is impossible since reclaim has failed, and
> > all work stalls with no remedy in sight.
> 
> Userspace was buggy, surely.  If userspace has elected to disable the
> oom-killer then it should ensure that it can cope with the ensuing result.
> 
> One approach might be to run a mlockall()ed watchdog which monitors the
> worker tasks via shared memory.  Another approach would be to run that
> watchdog in a different memcg, without mlockall().  There are surely
> plenty of other ways of doing it.
> 
> > This patch adds an oom killer delay so that a memcg may be configured to
> > wait at least a pre-defined number of milliseconds before calling the
> > oom killer.  If the oom condition persists for this number of
> > milliseconds, the oom killer will be called the next time the memory
> > controller attempts to charge a page (and memory.oom_control is set to
> > 0).  This allows userspace to have a short period of time to respond to
> > the condition before timing out and deferring to the kernel to kill a
> > task.
> > 
> > Admins may set the oom killer timeout using the new interface:
> > 
> > 	# echo 60000 > memory.oom_delay
> > 
> > This will defer oom killing to the kernel only after 60 seconds has
> > elapsed.  When setting memory.oom_delay, all pending timeouts are
> > restarted.
> > 
> 
> eww, ick ick ick.
> 
> 
> Minutea:
> 
> - changelog and docs forgot to mention that oom_delay=0 disables.
> 
> - it's called oom_kill_delay in the kernel and oom_delay in userspace.
> 
> - oom_delay_millisecs would be a better name for the pseudo file.
> 
> - Also, ick.
> 

seems to be hard to use. No one can estimate "milisecond" for avoidling
OOM-kill. I think this is very bad. Nack to this feature itself.

If you want something smart _in kernel_, please implement followings.

 - When hit oom, enlarge limit to some extent.
 - All processes in cgroup should be stopped.
 - A helper application will be called by usermode_helper().
 - When a helper application exit(), automatically release all processes
   to run again.

Then, you can avoid oom-kill situation in automatic with kernel's help.

BTW, don't call cgroup_lock(). It's always dangerous. You can add your own
lock.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>