Hi Johannes! Thank you for review! I do agree with most of the comments, and I will address them in v6. I'll post it soon. Please, find some comments below. On Tue, Aug 22, 2017 at 01:03:44PM -0400, Johannes Weiner wrote: > Hi Roman, > > great work! This looks mostly good to me now. Below are some nitpicks > concerning naming and code layout, but nothing major. > > > + > > + css_task_iter_start(&memcg->css, 0, &it); > > + while ((task = css_task_iter_next(&it))) { > > + /* > > + * If there are no tasks, or all tasks have oom_score_adj set > > + * to OOM_SCORE_ADJ_MIN and oom_kill_all_tasks is not set, > > + * don't select this memory cgroup. > > + */ > > + if (!elegible && > > + (memcg->oom_kill_all_tasks || > > + task->signal->oom_score_adj != OOM_SCORE_ADJ_MIN)) > > + elegible = 1; > > This is a little awkward to read. How about something like this: > > /* > * When killing individual tasks, we respect OOM score adjustments: > * at least one task in the group needs to be killable for the group > * to be oomable. > * > * Also check that previous OOM kills have finished, and abort if > * there are any pending OOM victims. > */ > oomable = memcg->oom_kill_all_tasks; > while ((task = css_task_iter_next(&it))) { > if (!oomable && task->signal_oom_score_adj != OOM_SCORE_ADJ_MIN) > oomable = 1; > > > + if (tsk_is_oom_victim(task) && > > + !test_bit(MMF_OOM_SKIP, &task->signal->oom_mm->flags)) { > > + elegible = -1; > > + break; > > + } > > + } > > + css_task_iter_end(&it); We ignore oom_score_adj if oom_kill_all_tasks is set, it's not reflected in your version. Anyway, I've moved the comments block outside and rephrased it to make more clear. > > etc. > > > + > > + return elegible > 0 ? memcg_oom_badness(memcg, nodemask) : elegible; > > I find these much easier to read if broken up, even if it's more LOC: > > if (eligible <= 0) > return eligible; > > return memcg_oom_badness(memcg, nodemask); > > > +static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc) > > +{ > > + struct mem_cgroup *iter, *parent; > > + > > + for_each_mem_cgroup_tree(iter, root) { > > + if (memcg_has_children(iter)) { > > + iter->oom_score = 0; > > + continue; > > + } > > + > > + iter->oom_score = oom_evaluate_memcg(iter, oc->nodemask); > > + if (iter->oom_score == -1) { > > Please add comments to document the special returns. Maybe #defines > would be clearer, too. > > > + oc->chosen_memcg = (void *)-1UL; > > + mem_cgroup_iter_break(root, iter); > > + return; > > + } > > + > > + if (!iter->oom_score) > > + continue; > > Same here. > > Maybe a switch would be suitable to handle the abort/no-score cases. Not sure about switch/defines, but I've added several comment blocks to describe possible return values, as well as their handling. Hope, it will be enough. > > static int memory_events_show(struct seq_file *m, void *v) > > { > > struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m)); > > @@ -5310,6 +5512,12 @@ static struct cftype memory_files[] = { > > .write = memory_max_write, > > }, > > { > > + .name = "oom_kill_all_tasks", > > + .flags = CFTYPE_NOT_ON_ROOT, > > + .seq_show = memory_oom_kill_all_tasks_show, > > + .write = memory_oom_kill_all_tasks_write, > > + }, > > This name is quite a mouthful and reminiscent of the awkward v1 > interface names. It doesn't really go well with the v2 names. > > How about memory.oom_group? I'd prefer to have something more obvious. I've renamed memory.oom_kill_all_tasks to memory.oom_kill_all, which was earlier suggested by Vladimir. Are you ok with it? Thanks! -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html