On Thu 12-12-13 11:31:59, Michal Hocko wrote: [...] > The semantic would be as simple as "notification is sent only when > an action is due". It will be still racy as nothing prevents a task > which is not under OOM to exit and release some memory but there is no > sensible way to address that. On the other hand such a semantic would be > sensible for oom_control listeners because they will know that an action > has to be or will be taken (the line was drawn). > > Can we agree on this, Johannes? Or you see the line drawn when > mem_cgroup_oom_synchronize has been reached already no matter whether > the action is to be done or not? Something like the following: >From 5d9c01e2814a7ade49db7945ad3890f4f138855e Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@xxxxxxx> Date: Thu, 12 Dec 2013 11:50:17 +0100 Subject: [PATCH] memcg: notify userspace about OOM when and action is due Userspace is currently notified about OOM condition after fails to reclaim any memory after MEM_CGROUP_RECLAIM_RETRIES rounds. This usually means that the memcg is really in troubles and an OOM action (either done by userspace or kernel) has to be taken. The kernel OOM killer however bails out and doesn't kill anything if it sees an already dying/exiting task in a good hope a memory will be released and OOM situation will be resolved. Therefore it makes sense to notify userspace only after really all measures have been taken and an userspace action is required or the kernel kills a task. This patch also removes fatal_signal_pending and PF_EXITING check from mem_cgroup_oom_synchronize because __mem_cgroup_try_charge already checks for both and bypasses charge so we cannot end up in the oom path. Signed-off-by: Michal Hocko <mhocko@xxxxxxx> --- mm/memcontrol.c | 17 ++++------------- mm/oom_kill.c | 5 +++++ 2 files changed, 9 insertions(+), 13 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 98900c070045..af7148c77bac 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2235,16 +2235,6 @@ bool mem_cgroup_oom_synchronize(bool handle) if (!handle) goto cleanup; - /* - * If current has a pending SIGKILL or is exiting, then automatically - * select it. The goal is to allow it to allocate so that it may - * quickly exit and free its memory. - */ - if (fatal_signal_pending(current) || current->flags & PF_EXITING) { - set_thread_flag(TIF_MEMDIE); - goto cleanup; - } - owait.memcg = memcg; owait.wait.flags = 0; owait.wait.func = memcg_oom_wake_function; @@ -2256,15 +2246,16 @@ bool mem_cgroup_oom_synchronize(bool handle) locked = mem_cgroup_oom_trylock(memcg); - if (locked) - mem_cgroup_oom_notify(memcg); - if (locked && !memcg->oom_kill_disable) { mem_cgroup_unmark_under_oom(memcg); finish_wait(&memcg_oom_waitq, &owait.wait); + /* calls mem_cgroup_oom_notify if there is a task to kill */ mem_cgroup_out_of_memory(memcg, current->memcg_oom.gfp_mask, current->memcg_oom.order); } else { + if (locked && memcg->oom_kill_disable) + mem_cgroup_oom_notify(memcg); + schedule(); mem_cgroup_unmark_under_oom(memcg); finish_wait(&memcg_oom_waitq, &owait.wait); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 1e4a600a6163..47c9de8da36d 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -394,6 +394,8 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order, dump_tasks(memcg, nodemask); } +extern void mem_cgroup_oom_notify(struct mem_cgroup *memcg); + #define K(x) ((x) << (PAGE_SHIFT-10)) /* * Must be called while holding a reference to p, which will be released upon @@ -470,6 +472,9 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, victim = p; } + if (memcg) + mem_cgroup_oom_notify(memcg); + /* mm cannot safely be dereferenced after task_unlock(victim) */ mm = victim->mm; pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n", -- 1.8.4.4 -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>