On Tue, 21 Dec 2010 23:59:24 -0800 Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > On Tue, 21 Dec 2010 23:27:25 -0800 (PST) David Rientjes <rientjes@xxxxxxxxxx> wrote: > > > Completely disabling the oom killer for a memcg is problematic if > > userspace is unable to address the condition itself, usually because > > userspace is unresponsive. This scenario creates a memcg livelock: > > tasks are continuously trying to allocate memory and nothing is getting > > killed, so memory freeing is impossible since reclaim has failed, and > > all work stalls with no remedy in sight. > > Userspace was buggy, surely. If userspace has elected to disable the > oom-killer then it should ensure that it can cope with the ensuing result. > > One approach might be to run a mlockall()ed watchdog which monitors the > worker tasks via shared memory. Another approach would be to run that > watchdog in a different memcg, without mlockall(). There are surely > plenty of other ways of doing it. > > > This patch adds an oom killer delay so that a memcg may be configured to > > wait at least a pre-defined number of milliseconds before calling the > > oom killer. If the oom condition persists for this number of > > milliseconds, the oom killer will be called the next time the memory > > controller attempts to charge a page (and memory.oom_control is set to > > 0). This allows userspace to have a short period of time to respond to > > the condition before timing out and deferring to the kernel to kill a > > task. > > > > Admins may set the oom killer timeout using the new interface: > > > > # echo 60000 > memory.oom_delay > > > > This will defer oom killing to the kernel only after 60 seconds has > > elapsed. When setting memory.oom_delay, all pending timeouts are > > restarted. > > > > eww, ick ick ick. > > > Minutea: > > - changelog and docs forgot to mention that oom_delay=0 disables. > > - it's called oom_kill_delay in the kernel and oom_delay in userspace. > > - oom_delay_millisecs would be a better name for the pseudo file. > > - Also, ick. > seems to be hard to use. No one can estimate "milisecond" for avoidling OOM-kill. I think this is very bad. Nack to this feature itself. If you want something smart _in kernel_, please implement followings. - When hit oom, enlarge limit to some extent. - All processes in cgroup should be stopped. - A helper application will be called by usermode_helper(). - When a helper application exit(), automatically release all processes to run again. Then, you can avoid oom-kill situation in automatic with kernel's help. BTW, don't call cgroup_lock(). It's always dangerous. You can add your own lock. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>