On Fri 20-07-18 13:28:56, David Rientjes wrote: > On Fri, 20 Jul 2018, Tejun Heo wrote: > > > > process chosen for oom kill. I know that you care about the latter. My > > > *only* suggestion was for the tunable to take a string instead of a > > > boolean so it is extensible for future use. This seems like something so > > > trivial. > > > > So, I'd much prefer it as boolean. It's a fundamentally binary > > property, either handle the cgroup as a unit when chosen as oom victim > > or not, nothing more. > > With the single hierarchy mandate of cgroup v2, the need arises to > separate processes from a single job into subcontainers for use with > controllers other than mem cgroup. In that case, we have no functionality > to oom kill all processes in the subtree. > > A boolean can kill all processes attached to the victim's mem cgroup, but > cannot kill all processes in a subtree if the limit of a common ancestor > is reached. The common ancestor is needed to enforce a single memory > limit but allow for processes to be constrained separately with other > controllers. I think you misunderstood the proposed semantic. oom.group is a property of any (including inter-node) memcg. Once set all the processes in its domain are killed in one go because they are considered indivisible workload. Note how this doesn't tell anything about _how_ we select a victim. That is not important and an in fact an implementation detail. All we care about is that a selected victim is a part of an indivisible workload and we have to tear down all of it. Future extensions can talk more about how we select the victim but the fundamental property of a group to be indivisible workload or a group of semi raleted processes is a 0/1 IMHO. Now there still are questions to iron out for that model. E.g. should we allow to make a subtree of oom.group == 1 to be group == 0? In other words something would be indivisible workload for one OOM context while it is not for more restrictive OOM scope. If yes, then what is the usecase? -- Michal Hocko SUSE Labs