On Fri, Jul 13, 2018 at 02:34:49PM -0700, David Rientjes wrote: > On Wed, 11 Jul 2018, Roman Gushchin wrote: > > > I was thinking on how to move forward with the cgroup-aware OOM killer. > > It looks to me, that we all agree on the "cleanup" part of the patchset: > > it's a nice feature to be able to kill all tasks in the cgroup > > to guarantee the consistent state of the workload. > > All our disagreements are related to the victim selection algorithm. > > > > So, I wonder, if the right thing to do is to split the problem. > > We can agree on the "cleanup" part, which is useful by itself, > > merge it upstream, and then return to the victim selection > > algorithm. > > > > So, here is my proposal: > > let's introduce the memory.group_oom knob with the following semantics: > > if the knob is set, the OOM killer can kill either none, either all > > tasks in the cgroup*. > > It can perfectly work with the current OOM killer (as a "cleanup" option), > > and allows _any_ further approach on the OOM victim selection. > > It also doesn't require any mount/boot/tree-wide options. > > > > How does it sound? > > > > No objection, of course, this was always the mechanism vs policy > separation that I was referring to. Having the ability to kill all > processes attached to the cgroup when one of its processes is selected is > useful, and we have our own patches that do just that, with the exception > that it's triggerable by the user. Perfect! I'll prepare the patchset. > > One of the things that I really like about cgroup v2, though, is what > appears to be an implicit, but rather apparent, goal to minimize the > number of files for each controller. It's very clean. So I'd suggest > that we consider memory.group_oom, or however it is named, to allow for > future development. > > For example, rather than simply being binary, we'd probably want the > ability to kill all eligible processes attached directly to the victim's > mem cgroup *or* all processes attached to its subtree as well. > > I'd suggest it be implemented to accept a string, "default"/"process", > "local" or "tree"/"hierarchy", or better names, to define the group oom > mechanism for the mem cgroup that is oom when one of its processes is > selected as a victim. I would prefer to keep it boolean to match the simplicity of cgroup v2 API. In v2 hierarchy processes can't be attached to non-leaf cgroups, so I don't see the place for the 3rd meaning. > > > * More precisely: if the OOM killer kills a task, > > it will traverse the cgroup tree up to the OOM domain (OOMing memcg or root), > > looking for the highest-level cgroup with group_oom set. Then it will > > kill all tasks in such cgroup, if it does exist. > > > > All such processes that are not oom disabled, yes. > Yep, of course. Thanks!