Re: [PATCH] mm, memcg: introduce per memcg oom_score_adj

Michal Hocko <mhocko@xxxxxxxx> · Thu, 22 Aug 2019 11:19:02 +0200

On Thu 22-08-19 04:56:29, Yafang Shao wrote:
> - Why we need a per memcg oom_score_adj setting ?
> This is easy to deploy and very convenient for container.
> When we use container, we always treat memcg as a whole, if we have a per
> memcg oom_score_adj setting we don't need to set it process by process.

Why cannot an initial process in the cgroup set the oom_score_adj and
other processes just inherit it from there? This sounds trivial to do
with a startup script.

> It will make the user exhausted to set it to all processes in a memcg.

Then let's have scripts to set it as they are less prone to exhaustion
;)
But seriously

> In this patch, a file named memory.oom.score_adj is introduced.
> The valid value of it is from -1000 to +1000, which is same with
> process-level oom_score_adj.
> When OOM is invoked, the effective oom_score_adj is as bellow,
>     effective oom_score_adj = original oom_score_adj + memory.oom.score_adj

This doesn't make any sense to me. Say that process has oom_score_adj
-1000 (never kill) then group oom_score_adj will simply break the
expectation and the task becomes killable for any value but -1000.
Why is summing up those values even sensible?

> The valid effective value is also from -1000 to +1000.
> This is something like a hook to re-calculate the oom_score_adj.

Besides that. What is the hierarchical semantic? Say you have hierarchy
	A (oom_score_adj = 1000)
	 \
	  B (oom_score_adj = 500)
	   \
	    C (oom_score_adj = -1000)

put the above summing up aside for now and just focus on the memcg
adjusting?
-- 
Michal Hocko
SUSE Labs