On Thu, Sep 14, 2017 at 03:40:14PM +0200, Michal Hocko wrote: > On Wed 13-09-17 14:56:07, Roman Gushchin wrote: > > On Wed, Sep 13, 2017 at 02:29:14PM +0200, Michal Hocko wrote: > [...] > > > I strongly believe that comparing only leaf memcgs > > > is more straightforward and it doesn't lead to unexpected results as > > > mentioned before (kill a small memcg which is a part of the larger > > > sub-hierarchy). > > > > One of two main goals of this patchset is to introduce cgroup-level > > fairness: bigger cgroups should be affected more than smaller, > > despite the size of tasks inside. I believe the same principle > > should be used for cgroups. > > Yes bigger cgroups should be preferred but I fail to see why bigger > hierarchies should be considered as well if they are not kill-all. And > whether non-leaf memcgs should allow kill-all is not entirely clear to > me. What would be the usecase? We definitely want to support kill-all for non-leaf cgroups. A workload can consist of several cgroups and we want to clean up the whole thing on OOM. I don't see any reasons to limit this functionality to leaf cgroups only. Hierarchies are memory consumers, we do account their usage, we do apply limits and guarantees for the hierarchies. The same is with OOM victim selection: we are reclaiming memory from the biggest consumer. Kill-all knob only defines the way _how_ we do that: by killing one or all processes. Just for example, we might want to take memory.low into account at some point: prefer cgroups which are above their guarantees, avoid killing those who fit. It would be hard if we're comparing cgroups from different hierarchies. The same will be with introducing oom_priorities, which is much more required functionality. > Consider that it might be not your choice (as a user) how deep is your > leaf memcg. I can already see how people complain that their memcg has > been killed just because it was one level deeper in the hierarchy... The kill-all functionality is enforced by parent, and it seems to be following the overall memcg design. The parent cgroup enforces memory limit, memory low limit, etc. I don't know why OOM control should be different. > > I would really start simple and only allow kill-all on leaf memcgs and > only compare leaf memcgs & root. If we ever need to kill whole > hierarchies then allow kill-all on intermediate memcgs as well and then > consider cumulative consumptions only on those that have kill-all > enabled. This sounds hacky to me: the whole thing is depending on cgroup v2 and is additionally explicitly opt-in. Why do we need to introduce such incomplete functionality first, and then suffer trying to extend it and provide backward compatibility? Also, I think we should compare root cgroup with top-level cgroups, rather than leaf cgroups. A process in the root cgroup is definitely system-level entity, and we should compare it with other top-level entities (other containerized workloads), rather then some random leaf cgroup deep inside the tree. If we decided, that we're not comparing random tasks from different cgroups, why should we do this for leaf cgroups? Is sounds like making only one step towards right direction, while we can do more. > > Or do I miss any reasonable usecase that would suffer from such a > semantic? Kill-all for sub-trees is definitely required. Enforcing oom_priorities for sub-trees is something that I would expect very useful too. Comparing leaf cgroups system-wide instead of processes doesn't sound good for me, we're lacking hierarchical fairness, which was one of two goals of this patchset. Thanks! -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html