Re: [patch -mm v2 2/3] mm, memcg: replace cgroup aware oom killer mount option with tunable

David Rientjes <rientjes@xxxxxxxxxx> · Fri, 26 Jan 2018 14:20:24 -0800 (PST)

On Thu, 25 Jan 2018, Andrew Morton wrote:

> > Now that each mem cgroup on the system has a memory.oom_policy tunable to
> > specify oom kill selection behavior, remove the needless "groupoom" mount
> > option that requires (1) the entire system to be forced, perhaps
> > unnecessarily, perhaps unexpectedly, into a single oom policy that
> > differs from the traditional per process selection, and (2) a remount to
> > change.
> > 
> > Instead of enabling the cgroup aware oom killer with the "groupoom" mount
> > option, set the mem cgroup subtree's memory.oom_policy to "cgroup".
> 
> Can we retain the groupoom mount option and use its setting to set the
> initial value of every memory.oom_policy?  That way the mount option
> remains somewhat useful and we're back-compatible?
> 

-ECONFUSED.  We want to have a mount option that has the sole purpose of 
doing echo cgroup > /mnt/cgroup/memory.oom_policy?

Please note that this patchset is not only to remove a mount option, it is 
to allow oom policies to be configured per subtree such that users whom 
you delegate those subtrees to cannot evade the oom policy that is set at 
a higher level.  The goal is to prevent the user from needing to organize 
their hierarchy is a specific way to work around this constraint and use 
things like limiting the number of child cgroups that user is allowed to 
create only to work around the oom policy.  With a cgroup v2 single 
hierarchy it severely limits the amount of control the user has over their 
processes because they are locked into a very specific hierarchy 
configuration solely to not allow the user to evade oom kill.

This, and fixes to fairly compare the root mem cgroup with leaf mem 
cgroups, are essential before the feature is merged otherwise it yields 
wildly unpredictable (and unexpected, since its interaction with 
oom_score_adj isn't documented) results as I already demonstrated where 
cgroups with 1GB of usage are killed instead of 6GB workers outside of 
that subtree.
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html