On Tue, 23 Jan 2018, Michal Hocko wrote: > > It can't, because the current patchset locks the system into a single > > selection criteria that is unnecessary and the mount option would become a > > no-op after the policy per subtree becomes configurable by the user as > > part of the hierarchy itself. > > This is simply not true! OOM victim selection has changed in the > past and will be always a subject to changes in future. Current > implementation doesn't provide any externally controlable selection > policy and therefore the default can be assumed. Whatever that default > means now or in future. The only contract added here is the kill full > memcg if selected and that can be implemented on _any_ selection policy. > The current implementation of memory.oom_group is based on top of a selection implementation that is broken in three ways I have listed for months: - allows users to intentionally/unintentionally evade the oom killer, requires not locking the selection implementation for the entire system, requires subtree control to prevent, makes a mount option obsolete, and breaks existing users who would use the implementation based on 4.16 if this were merged, - unfairly compares the root mem cgroup vs leaf mem cgroup such that users must structure their hierarchy only for 4.16 in such a way that _all_ processes are under hierarchical control and have no power to create sub cgroups because of the point above and completely breaks any user of oom_score_adj in a completely undocumented and unspecified way, such that fixing that breakage would also break any existing users who would use the implementation based on 4.16 if this were merged, and - does not allow userspace to protect important cgroups, which can be built on top. I'm focused on fixing the breakage in the first two points since it affects the API and we don't want to switch that out from the user. I have brought these points up repeatedly and everybody else has actively disengaged from development, so I'm proposing incremental changes that make the cgroup aware oom killer have a sustainable API and isn't useful only for a highly specialized usecase where everything is containerized, nobody can create subcgroups, and nobody uses oom_score_adj to break the root mem cgroup accounting. -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html