Michal Hocko wrote: > On Sun 31-05-15 20:10:23, Tetsuo Handa wrote: > [...] > > By the way, I got two mumbles. > > > > Is "If any of p's children has a different mm and is eligible for kill," logic > > in oom_kill_process() really needed? Didn't select_bad_process() which was > > called proior to calling oom_kill_process() already choose a best victim > > using for_each_process_thread() ? > > This tries to have smaller effect on the system. It tries to kill > younger tasks because this might be and quite often is sufficient to > resolve the OOM condition. > > > Is "/* mm cannot safely be dereferenced after task_unlock(victim) */" true? > > It seems to me that it should be "/* mm cannot safely be compared after > > task_unlock(victim) */" because it is theoretically possible to have > > > > CPU 0 CPU 1 CPU 2 > > task_unlock(victim); > > victim exits and releases mm. > > Usage count of the mm becomes 0 and thus released. > > New mm is allocated and assigned to some thread. > > (p->mm == mm) matches the recreated mm and kill unrelated p. > > > > sequence. We need to either get a reference to victim's mm before > > task_unlock(victim) or do comparison before task_unlock(victim). > > Hmm, I guess you are right. The race is theoretically possible, > especially when there are many tasks when iterating over the list might > take some time. reference to the mm would solve this. Care to send a > patch? > > -- > Michal Hocko > SUSE Labs > I see. Here is a patch. mmput() may sleep. But oom_kill_process() is a sleep-able context, right? ---------------------------------------- >From 15afd1f40b132719c323e81d58064ff7115206f9 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> Date: Mon, 1 Jun 2015 20:54:14 +0900 Subject: [PATCH] mm,oom: Fix potentially killing unrelated process or depleting memory. At the for_each_process() loop in oom_kill_process(), we are comparing address of OOM victim's mm without holding a reference to that mm. If there are a lot of processes to compare or a lot of "Kill process %d (%s) sharing same memory" messages to print, for_each_process() loop could take very long time. It is possible that meanwhile the OOM victim exits and releases its mm, and then mm is allocated with the same address and assigned to some unrelated process. When we hit such race, the unrelated process will be killed by error. To make sure that the OOM victim's mm does not go away until for_each_process() loop finishes, get a reference on the OOM victim's mm before calling task_unlock(victim). Likewise, move do_send_sig_info(SIGKILL, victim) to before mark_oom_victim(victim) in case for_each_process() took very long time, for the OOM victim can abuse ALLOC_NO_WATERMARKS by TIF_MEMDIE via e.g. memset() in user space until SIGKILL is delivered. Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> --- mm/oom_kill.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index dff991e..5eb1e65 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -559,14 +559,17 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, victim = p; } - /* mm cannot safely be dereferenced after task_unlock(victim) */ + /* Get a reference to safely compare mm after task_unlock(victim) */ mm = victim->mm; - mark_oom_victim(victim); + atomic_inc(&mm->mm_users); pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n", - task_pid_nr(victim), victim->comm, K(victim->mm->total_vm), - K(get_mm_counter(victim->mm, MM_ANONPAGES)), - K(get_mm_counter(victim->mm, MM_FILEPAGES))); + task_pid_nr(victim), victim->comm, K(mm->total_vm), + K(get_mm_counter(mm, MM_ANONPAGES)), + K(get_mm_counter(mm, MM_FILEPAGES))); task_unlock(victim); + /* Send SIGKILL before setting TIF_MEMDIE. */ + do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true); + mark_oom_victim(victim); /* * Kill all user processes sharing victim->mm in other thread groups, if @@ -592,7 +595,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order, } rcu_read_unlock(); - do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true); + mmput(mm); put_task_struct(victim); } #undef K -- 1.8.3.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>