On Mon, Jul 16, 2018 at 09:19:18PM -0700, David Rientjes wrote: > On Fri, 13 Jul 2018, Roman Gushchin wrote: > > > > > > All cgroup v2 files do not need to be boolean and the only way you can add > > > > > a subtree oom kill is to introduce yet another file later. Please make it > > > > > tristate so that you can define a mechanism of default (process only), > > > > > local cgroup, or subtree, and so we can avoid adding another option later > > > > > that conflicts with the proposed one. This should be easy. > > > > > > > > David, we're adding a cgroup v2 knob, and in cgroup v2 a memory cgroup > > > > either has a sub-tree, either attached processes. So, there is no difference > > > > between local cgroup and subtree. > > > > > > > > > > Uhm, what? We're talking about a common ancestor reaching its limit, so > > > it's oom, and it has multiple immediate children with their own processes > > > attached. The difference is killing all processes attached to the > > > victim's cgroup or all processes under the oom mem cgroup's subtree. > > > > > > > But it's a binary decision, no? > > If memory.group_oom set, the whole sub-tree will be killed. Otherwise not. > > > > No, if memory.max is reached and memory.group_oom is set, my understanding > of your proposal is that a process is chosen and all eligible processes > attached to its mem cgroup are oom killed. My desire for a tristate is so > that it can be specified that all processes attached to the *subtree* are > oom killed. With single unified hierarchy mandated by cgroup v2, we can > separate descendant cgroups for use with other controllers and enforce > memory.max by an ancestor. > > Making this a boolean value is only preventing it from becoming > extensible. If memory.group_oom only is effective for the victim's mem > cgroup, it becomes impossible to specify that all processes in the subtree > should be oom killed as a result of the ancestor limit without adding yet > another tunable. Let me show my proposal on examples. Let's say we have the following hierarchy, and the biggest process (or the process with highest oom_score_adj) is in D. / | A | B / \ C D Let's look at different examples and intended behavior: 1) system-wide OOM - default settings: the biggest process is killed - D/memory.group_oom=1: all processes in D are killed - A/memory.group_oom=1: all processes in A are killed 2) memcg oom in B - default settings: the biggest process is killed - A/memory.group_oom=1: the biggest process is killed - B/memory.group_oom=1: all processes in B are killed - D/memory.group_oom=1: all processes in D are killed Please, note, that processes can't be attached directly to A and B, so "all processes in A are killed" means all processes in the sub-tree are killed. Immortal processes (oom_score_adj=-1000) are excluded. I believe, that this model is full and doesn't require any further extension. Thanks!