Re: [patch 1/2] mm, memcg: avoid oom notification when current needs access to memory reserves

Michal Hocko <mhocko@xxxxxxx> · Thu, 12 Dec 2013 13:37:36 +0100

On Thu 12-12-13 13:11:40, Michal Hocko wrote:
> On Thu 12-12-13 11:31:59, Michal Hocko wrote:
> [...]
> > The semantic would be as simple as "notification is sent only when
> > an action is due". It will be still racy as nothing prevents a task
> > which is not under OOM to exit and release some memory but there is no
> > sensible way to address that. On the other hand such a semantic would be
> > sensible for oom_control listeners because they will know that an action
> > has to be or will be taken (the line was drawn).
> > 
> > Can we agree on this, Johannes? Or you see the line drawn when
> > mem_cgroup_oom_synchronize has been reached already no matter whether
> > the action is to be done or not?
> 
> Something like the following:

I forgot to mention that this patch assumes "memcg: Do not hang on OOM
when killed by userspace OOM"

> From 5d9c01e2814a7ade49db7945ad3890f4f138855e Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@xxxxxxx>
> Date: Thu, 12 Dec 2013 11:50:17 +0100
> Subject: [PATCH] memcg: notify userspace about OOM when and action is due
> 
> Userspace is currently notified about OOM condition after fails
> to reclaim any memory after MEM_CGROUP_RECLAIM_RETRIES rounds.
> This usually means that the memcg is really in troubles and an
> OOM action (either done by userspace or kernel) has to be taken.
> The kernel OOM killer however bails out and doesn't kill anything
> if it sees an already dying/exiting task in a good hope a memory
> will be released and OOM situation will be resolved.
> 
> Therefore it makes sense to notify userspace only after really all
> measures have been taken and an userspace action is required or
> the kernel kills a task.
> 
> This patch also removes fatal_signal_pending and PF_EXITING check from
> mem_cgroup_oom_synchronize because __mem_cgroup_try_charge already
> checks for both and bypasses charge so we cannot end up in the oom path.

Hmm, I have just noticed that oom_scan_process_thread aborts scanning
only if it sees PF_EXITING or TIF_MEMDIE. Why the same is not done for 
fatal_signal_pending tasks as well? Following the same logic as for the
current we should do that no?

The different sets of checks is so confusing :/

> Signed-off-by: Michal Hocko <mhocko@xxxxxxx>
> ---
>  mm/memcontrol.c | 17 ++++-------------
>  mm/oom_kill.c   |  5 +++++
>  2 files changed, 9 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 98900c070045..af7148c77bac 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2235,16 +2235,6 @@ bool mem_cgroup_oom_synchronize(bool handle)
>  	if (!handle)
>  		goto cleanup;
>  
> -	/*
> -	 * If current has a pending SIGKILL or is exiting, then automatically
> -	 * select it.  The goal is to allow it to allocate so that it may
> -	 * quickly exit and free its memory.
> -	 */
> -	if (fatal_signal_pending(current) || current->flags & PF_EXITING) {
> -		set_thread_flag(TIF_MEMDIE);
> -		goto cleanup;
> -	}
> -
>  	owait.memcg = memcg;
>  	owait.wait.flags = 0;
>  	owait.wait.func = memcg_oom_wake_function;
> @@ -2256,15 +2246,16 @@ bool mem_cgroup_oom_synchronize(bool handle)
>  
>  	locked = mem_cgroup_oom_trylock(memcg);
>  
> -	if (locked)
> -		mem_cgroup_oom_notify(memcg);
> -
>  	if (locked && !memcg->oom_kill_disable) {
>  		mem_cgroup_unmark_under_oom(memcg);
>  		finish_wait(&memcg_oom_waitq, &owait.wait);
> +		/* calls mem_cgroup_oom_notify if there is a task to kill */
>  		mem_cgroup_out_of_memory(memcg, current->memcg_oom.gfp_mask,
>  					 current->memcg_oom.order);
>  	} else {
> +		if (locked && memcg->oom_kill_disable)
> +			mem_cgroup_oom_notify(memcg);
> +
>  		schedule();
>  		mem_cgroup_unmark_under_oom(memcg);
>  		finish_wait(&memcg_oom_waitq, &owait.wait);
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 1e4a600a6163..47c9de8da36d 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -394,6 +394,8 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
>  		dump_tasks(memcg, nodemask);
>  }
>  
> +extern void mem_cgroup_oom_notify(struct mem_cgroup *memcg);
> +
>  #define K(x) ((x) << (PAGE_SHIFT-10))
>  /*
>   * Must be called while holding a reference to p, which will be released upon
> @@ -470,6 +472,9 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  		victim = p;
>  	}
>  
> +	if (memcg)
> +		mem_cgroup_oom_notify(memcg);
> +
>  	/* mm cannot safely be dereferenced after task_unlock(victim) */
>  	mm = victim->mm;
>  	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
> -- 
> 1.8.4.4
> 
> -- 
> Michal Hocko
> SUSE Labs
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>