On Tue 31-10-17 08:04:19, Shakeel Butt wrote: > > + > > +static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc) > > +{ > > + struct mem_cgroup *iter; > > + > > + oc->chosen_memcg = NULL; > > + oc->chosen_points = 0; > > + > > + /* > > + * The oom_score is calculated for leaf memory cgroups (including > > + * the root memcg). > > + */ > > + rcu_read_lock(); > > + for_each_mem_cgroup_tree(iter, root) { > > + long score; > > + > > + if (memcg_has_children(iter) && iter != root_mem_cgroup) > > + continue; > > + > > Cgroup v2 does not support charge migration between memcgs. So, there > can be intermediate nodes which may contain the major charge of the > processes in their leave descendents. Skipping such intermediate nodes > will kind of protect such processes from oom-killer (lower on the list > to be killed). Is it ok to not handle such scenario? If yes, shouldn't > we document it? Yes, this is a real problem and the one which is not really solvable without the charge migration. You simply have no clue _who_ owns the memory so I assume that admins will need to setup the hierarchy which allows subgroups to migrate tasks to be oom_group. Or we might want to allow opt-in for charge migration in v2. To be honest I wasn't completely happy about removing this functionality altogether in v2 but there was a strong pushback back then that relying on the charge migration doesn't have any sound usecase. Anyway, I agree that documentation should be explicit about that. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html