I've got lost in the follow up patches. Could you post only rework of this particular one, please? You have already noticed that nodemask check is tricky and the memcg check would be tricky for different reason. On Sun 03-07-16 11:38:20, Tetsuo Handa wrote: [...] > +bool oom_has_pending_mm(struct mem_cgroup *memcg, const nodemask_t *nodemask) > +{ > + struct oom_mm *mm; > + bool ret = false; > + > + spin_lock(&oom_mm_lock); > + list_for_each_entry(mm, &oom_mm_list, list) { > + if (memcg && mm->memcg != memcg) > + continue; > + if (nodemask && mm->nodemask != nodemask) > + continue; > + ret = true; > + break; > + } > + spin_unlock(&oom_mm_lock); > + return ret; > +} > + > enum oom_scan_t oom_scan_process_thread(struct oom_control *oc, > struct task_struct *task) > { [...] > @@ -647,18 +668,37 @@ subsys_initcall(oom_init) > /** > * mark_oom_victim - mark the given task as OOM victim > * @tsk: task to mark > + * @oc: oom_control > * > * Has to be called with oom_lock held and never after > * oom has been disabled already. > */ > -void mark_oom_victim(struct task_struct *tsk) > +void mark_oom_victim(struct task_struct *tsk, struct oom_control *oc) > { > + struct mm_struct *mm = tsk->mm; > + > WARN_ON(oom_killer_disabled); > /* OOM killer might race with memcg OOM */ > if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE)) > return; > atomic_inc(&tsk->signal->oom_victims); > /* > + * Since mark_oom_victim() is called from multiple threads, > + * connect this mm to oom_mm_list only if not yet connected. > + * > + * Since mark_oom_victim() is called with a stable mm (i.e. > + * mm->mm_userst > 0), exit_oom_mm() from __mmput() can't be called > + * before we add this mm to the list. > + */ > + spin_lock(&oom_mm_lock); > + if (!mm->oom_mm.list.next) { > + atomic_inc(&mm->mm_count); > + mm->oom_mm.memcg = oc->memcg; > + mm->oom_mm.nodemask = oc->nodemask; > + list_add_tail(&mm->oom_mm.list, &oom_mm_list); > + } Here you are storing the memcg where the OOM killer happened but later on we might encounter an OOM on upper level of the memcg hierarchy and we want to prevent from the oom killer if there is an mm which hasn't released a memory from the lower level of the hierarchy. E.g. A / \ B C C hits a limit and invokes oom. Then we hit oom on A because of a charge for B. Now for_each_mem_cgroup_tree would encounter such a task/mm and abort the selection. With a pure memcg pointer check we would skip that mm. This would be fixable by doing mem_cgroup_is_descendant but that requires an alive memcg's css so you have to pin it. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>