On Tue 30-01-18 14:38:40, David Rientjes wrote: > On Tue, 30 Jan 2018, Michal Hocko wrote: > > > > > So what is the actual semantic and scope of this policy. Does it apply > > > > only down the hierarchy. Also how do you compare cgroups with different > > > > policies? Let's say you have > > > > root > > > > / | \ > > > > A B C > > > > / \ / \ > > > > D E F G > > > > > > > > Assume A: cgroup, B: oom_group=1, C: tree, G: oom_group=1 > > > > > > > > > > At each level of the hierarchy, memory.oom_policy compares immediate > > > children, it's the only way that an admin can lock in a specific oom > > > policy like "tree" and then delegate the subtree to the user. If you've > > > configured it as above, comparing A and C should be the same based on the > > > cumulative usage of their child mem cgroups. > > > > So cgroup == tree if we are memcg aware OOM killing, right? Why do we > > need both then? Just to make memcg aware OOM killing possible? > > > > We need "tree" to account the usage of the subtree rather than simply the > cgroup alone, but "cgroup" and "tree" are accounted with the same units. > In your example, D and E are treated as individual memory consumers and C > is treated as the sum of all subtree memory consumers. It seems I am still not clear with my question. What kind of difference does policy=cgroup vs. none on A? Also what kind of different does it make when a leaf node has cgroup policy? [...] > > So now you have a killable cgroup selected by process criterion? That > > just doesn't make any sense. So I guess it would at least require to > > enforce (cgroup || tree) to allow oom_group. > > > > Hmm, I'm not sure why we would limit memory.oom_group to any policy. Even > if we are selecting a process, even without selecting cgroups as victims, > killing a process may still render an entire cgroup useless and it makes > sense to kill all processes in that cgroup. If an unlucky process is > selected with today's heursitic of oom_badness() or with a "none" policy > with my patchset, I don't see why we can't enable the user to kill all > other processes in the cgroup. It may not make sense for some trees, but > but I think it could be useful for others. My intuition screams here. I will think about this some more but I would be really curious about any sensible usecase when you want sacrifice the whole gang just because of one process compared to other processes or cgroups is too large. Do you see how you are mixing entities here? > > > Right, a policy of "none" reverts its subtree back to per-process > > > comparison if you are either not using the cgroup aware oom killer or your > > > subtree is not using the cgroup aware oom killer. > > > > So how are you going to compare none cgroups with those that consider > > full memcg or hierarchy (cgroup, tree)? Are you going to consider > > oom_score_adj? > > > > No, I think it would make sense to make the restriction that to set > "none", the ancestor mem cgroups would also need the same policy, I do not understand. Get back to our example. Are you saying that G with none will enforce the none policy to C and root? If yes then this doesn't make any sense because you are not really able to delegate the oom policy down the tree at all. It would effectively make tree policy pointless. I am skipping the rest of the following text because it is picking on details and the whole design is not clear to me. So could you start over documenting semantic and requirements. Ideally by describing: - how does the policy on the root of the OOM hierarchy controls the selection policy - how does the per-memcg policy act during the tree walk - for both intermediate nodes and leafs - how does the oom killer act based on the selected memcg - how do you compare tasks with memcgs [...] -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html