The oom killer's goal is to kill a memory-hogging task so that it may exit, free its memory, and allow the current context to allocate the memory that triggered it in the first place. Thus, killing a task is pointless if other threads sharing its mm cannot be killed because of its /proc/pid/oom_adj or /proc/pid/oom_score_adj value. This patch checks all user threads on the system to determine whether oom_badness(p) should return 0 for p, which means it should not be killed. If a thread shares p's mm and is unkillable, p is considered to be unkillable as well. Kthreads are not considered toward this rule since they only temporarily assume a task's mm via use_mm(). Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx> --- v2: change do_each_thread() to for_each_process() as suggested by Oleg. It's actually not possible to move this logic to oom_kill_task() because it's racy: oom_badness() is not a constant score and depends on the state of the VM when it is called. This leads to unnecessarily panicking the machine in that case as wel as when the same child to sacrifice is repeatedly selected in oom_kill_process() based on the parent's badness score. mm/oom_kill.c | 28 +++++++++++++++++++++------- 1 files changed, 21 insertions(+), 7 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -83,6 +83,25 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk, #endif /* CONFIG_NUMA */ /* + * Determines whether an mm is unfreeable since a user thread attached to + * it cannot be killed. Kthreads only temporarily assume a thread's mm, + * so they are not considered. + * + * mm need not be protected by task_lock() since it will not be + * dereferened. + */ +static bool is_mm_unfreeable(struct mm_struct *mm) +{ + struct task_struct *p; + + for_each_process(p) + if (p->mm == mm && !(p->flags & PF_KTHREAD) && + p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) + return true; + return false; +} + +/* * If this is a system OOM (not a memcg OOM) and the task selected to be * killed is not already running at high (RT) priorities, speed up the * recovery by boosting the dying task to the lowest FIFO priority. @@ -160,12 +179,7 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem, p = find_lock_task_mm(p); if (!p) return 0; - - /* - * Shortcut check for OOM_SCORE_ADJ_MIN so the entire heuristic doesn't - * need to be executed for something that cannot be killed. - */ - if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) { + if (is_mm_unfreeable(p->mm)) { task_unlock(p); return 0; } @@ -675,7 +689,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, read_lock(&tasklist_lock); if (sysctl_oom_kill_allocating_task && !oom_unkillable_task(current, NULL, nodemask) && - (current->signal->oom_adj != OOM_DISABLE)) { + !is_mm_unfreeable(current->mm)) { /* * oom_kill_process() needs tasklist_lock held. If it returns * non-zero, current could not be killed so we must fallback to -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>