On Fri, Jul 20, 2018 at 01:28:56PM -0700, David Rientjes wrote: > On Fri, 20 Jul 2018, Tejun Heo wrote: > > > > process chosen for oom kill. I know that you care about the latter. My > > > *only* suggestion was for the tunable to take a string instead of a > > > boolean so it is extensible for future use. This seems like something so > > > trivial. > > > > So, I'd much prefer it as boolean. It's a fundamentally binary > > property, either handle the cgroup as a unit when chosen as oom victim > > or not, nothing more. > > With the single hierarchy mandate of cgroup v2, the need arises to > separate processes from a single job into subcontainers for use with > controllers other than mem cgroup. In that case, we have no functionality > to oom kill all processes in the subtree. > > A boolean can kill all processes attached to the victim's mem cgroup, but > cannot kill all processes in a subtree if the limit of a common ancestor > is reached. Why so? Once again my proposal: as soon as the OOM killer selected a victim task, we'll look at the victim task's memory cgroup. If memory.oom.group is not set, we're done. Otherwise let's traverse the memory cgroup tree up to the OOMing cgroup (or root) as long as memory.oom.group is set. Kill the last cgroup entirely (including all children). Please, note: we do not look at memory.oom.group of the OOMing cgroup, we're looking at the memcg of the victim task. If this model doesn't work well for you case, please, describe it on an example. I'm not sure I understand your problem anymore. Thanks!