Re: cgroup-aware OOM killer, how to move forward

David Rientjes <rientjes@xxxxxxxxxx> · Fri, 20 Jul 2018 13:28:56 -0700 (PDT)

On Fri, 20 Jul 2018, Tejun Heo wrote:

> > process chosen for oom kill.  I know that you care about the latter.  My 
> > *only* suggestion was for the tunable to take a string instead of a 
> > boolean so it is extensible for future use.  This seems like something so 
> > trivial.
> 
> So, I'd much prefer it as boolean.  It's a fundamentally binary
> property, either handle the cgroup as a unit when chosen as oom victim
> or not, nothing more.

With the single hierarchy mandate of cgroup v2, the need arises to 
separate processes from a single job into subcontainers for use with 
controllers other than mem cgroup.  In that case, we have no functionality 
to oom kill all processes in the subtree.

A boolean can kill all processes attached to the victim's mem cgroup, but 
cannot kill all processes in a subtree if the limit of a common ancestor 
is reached.  The common ancestor is needed to enforce a single memory 
limit but allow for processes to be constrained separately with other 
controllers. 

So if group oom takes on a boolean type, then we mandate that all 
processes to be killed must share the same cgroup which cannot always be 
done.  Thus, I was suggesting that group oom can also configure for 
subtree killing when the limit of a shared ancestor is reached.  This is 
unique only to non-leaf cgroups.  So non-leaf and leaf cgroups have 
mutually exclusive group oom settings; if we have two tunables, which this 
would otherwise require, the setting of one would always be irrelevant 
based on non-leaf or leaf.