On Tue, Jan 16, 2018 at 06:14:58PM -0800, David Rientjes wrote: > There are three significant concerns about the cgroup aware oom killer as > it is implemented in -mm: > > (1) allows users to evade the oom killer by creating subcontainers or > using other controllers since scoring is done per cgroup and not > hierarchically, > > (2) does not allow the user to influence the decisionmaking, such that > important subtrees cannot be preferred or biased, and > > (3) unfairly compares the root mem cgroup using completely different > criteria than leaf mem cgroups and allows wildly inaccurate results > if oom_score_adj is used. > > This patchset aims to fix (1) completely and, by doing so, introduces a > completely extensible user interface that can be expanded in the future. > > It eliminates the mount option for the cgroup aware oom killer entirely > since it is now enabled through the root mem cgroup's oom policy. > > It eliminates a pointless tunable, memory.oom_group, that unnecessarily > pollutes the mem cgroup v2 filesystem and is invalid when cgroup v2 is > mounted with the "groupoom" option. You're introducing a new oom_policy knob, which has two separate sets of possible values for the root and non-root cgroups. I don't think it aligns with the existing cgroup v2 design. For the root cgroup it works exactly as mount option, and both "none" and "cgroup" values have no meaning outside of the root cgroup. We can discuss if a knob on root cgroup is better than a mount option, or not (I don't think so), but it has nothing to do with oom policy as you define it for non-root cgroups. For non-root cgroups you're introducing "all" and "tree", and the _only_ difference is that in the "all" mode all processes will be killed, rather than the biggest in the "tree". I find these names confusing, in reality it's more "evaluate together and kill all" and "evaluate together and kill one". So, it's not really the fully hierarchical approach, which I thought, you were arguing for. You can easily do the same with adding the third value to the memory.groupoom knob, as I've suggested earlier (say, "disable, "kill" and "evaluate"), and will be much less confusing. Thanks! Roman -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html