On Fri, 20 Jul 2018, Tejun Heo wrote: > > process chosen for oom kill. I know that you care about the latter. My > > *only* suggestion was for the tunable to take a string instead of a > > boolean so it is extensible for future use. This seems like something so > > trivial. > > So, I'd much prefer it as boolean. It's a fundamentally binary > property, either handle the cgroup as a unit when chosen as oom victim > or not, nothing more. With the single hierarchy mandate of cgroup v2, the need arises to separate processes from a single job into subcontainers for use with controllers other than mem cgroup. In that case, we have no functionality to oom kill all processes in the subtree. A boolean can kill all processes attached to the victim's mem cgroup, but cannot kill all processes in a subtree if the limit of a common ancestor is reached. The common ancestor is needed to enforce a single memory limit but allow for processes to be constrained separately with other controllers. So if group oom takes on a boolean type, then we mandate that all processes to be killed must share the same cgroup which cannot always be done. Thus, I was suggesting that group oom can also configure for subtree killing when the limit of a shared ancestor is reached. This is unique only to non-leaf cgroups. So non-leaf and leaf cgroups have mutually exclusive group oom settings; if we have two tunables, which this would otherwise require, the setting of one would always be irrelevant based on non-leaf or leaf.