On Fri, 20 Jul 2018, Roman Gushchin wrote: > > > > process chosen for oom kill. I know that you care about the latter. My > > > > *only* suggestion was for the tunable to take a string instead of a > > > > boolean so it is extensible for future use. This seems like something so > > > > trivial. > > > > > > So, I'd much prefer it as boolean. It's a fundamentally binary > > > property, either handle the cgroup as a unit when chosen as oom victim > > > or not, nothing more. > > > > With the single hierarchy mandate of cgroup v2, the need arises to > > separate processes from a single job into subcontainers for use with > > controllers other than mem cgroup. In that case, we have no functionality > > to oom kill all processes in the subtree. > > > > A boolean can kill all processes attached to the victim's mem cgroup, but > > cannot kill all processes in a subtree if the limit of a common ancestor > > is reached. > > Why so? > > Once again my proposal: > as soon as the OOM killer selected a victim task, > we'll look at the victim task's memory cgroup. > If memory.oom.group is not set, we're done. > Otherwise let's traverse the memory cgroup tree up to > the OOMing cgroup (or root) as long as memory.oom.group is set. > Kill the last cgroup entirely (including all children). > I know this is your proposal, I'm suggesting a context-based extension based on which mem cgroup is oom: the common ancestor or the leaf. Consider /A, /A/b, and /A/c, and memory.oom_group is 1 for all of them. When /A, /A/b, or /A/c is oom, all processes attached to /A and its subtree are oom killed per your semantic. That occurs when any of the three mem cgroups are oom. I'm suggesting that it may become useful to kill an entire subtree when the common ancestor, /A, is oom, but not when /A/b or /A/c is oom. There is no way to specify this with the proposal and trees where the limits of /A/b + /A/c > /A exist. We want all processes killed in /A/b or /A/c if they reach their individual limits. We want all processes killed in /A's subtree if /A reaches its limit. I am not asking for that support to be implemented immediately if you do not have a need for it. But I am asking that your interface to do so is extensible so that we may implement it. Given the no internal process constraint of cgroup v2, defining this as two separate tunables would always have one be effective and the other be irrelevant, so I suggest it is overloaded.