On Fri 24-06-16 19:56:43, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Fri 24-06-16 01:24:46, Tetsuo Handa wrote: > > > I missed that victim != p case needs to use get_task_struct(). Patch updated. > > > ---------------------------------------- > > > >From 1819ec63b27df2d544f66482439e754d084cebed Mon Sep 17 00:00:00 2001 > > > From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > > > Date: Fri, 24 Jun 2016 01:16:02 +0900 > > > Subject: [PATCH v2] mm, oom: don't set TIF_MEMDIE on a mm-less thread. > > > > > > Patch "mm, oom: fortify task_will_free_mem" removed p->mm != NULL test for > > > shortcut path in oom_kill_process(). But since commit f44666b04605d1c7 > > > ("mm,oom: speed up select_bad_process() loop") changed to iterate using > > > thread group leaders, the possibility of p->mm == NULL has increased > > > compared to when commit 83363b917a2982dd ("oom: make sure that TIF_MEMDIE > > > is set under task_lock") was proposed. On CONFIG_MMU=n kernels, nothing > > > will clear TIF_MEMDIE and the system can OOM livelock if TIF_MEMDIE was > > > by error set to a mm-less thread group leader. > > > > > > Let's do steps for regular path except printing OOM killer messages and > > > sending SIGKILL. > > > > I fully agree with Oleg. It would be much better to encapsulate this > > into mark_oom_victim and guard it by ifdef NOMMU as this is nommu > > specific with a big fat warning why we need it. > > OK. But before doing so, which one ((A) or (B) shown below) do you prefer? > > > (A) Don't use task_will_free_mem(p) shortcut in oom_kill_process() if CONFIG_MMU=n. > > Since task_will_free_mem(p) == true where p is the largest memory consumer > (with oom_score_adj taken into account) is not exiting smoothly, as with > commit 6a618957ad17d8f4 ("mm: oom_kill: don't ignore oom score on exiting > tasks") thought, it can be a sign of something bad (possibly OOM livelock) is > happening. Thus, print the OOM killer messages anyway although all tasks > which will be OOM killed are already killed/exiting (unless p has OOM killable > children). This will help giving administrator a hint when the kernel hit > OOM livelock. [...] > (B) Check mm in mark_oom_victim() if CONFIG_MMU=n. > > Since mark_oom_victim() is also called from current->mm && task_will_free_mem(current) > shortcut in out_of_memory(), mark_oom_victim(current) needs to set TIF_MEMDIE on current > if current->mm != NULL. I think you are overcomplicating this. Why cannot we simply do the following? --- diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 4c21f744daa6..97be9324a58b 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -671,6 +671,22 @@ void mark_oom_victim(struct task_struct *tsk) /* OOM killer might race with memcg OOM */ if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE)) return; +#ifndef CONFIG_MMU + /* + * we shouldn't risk setting TIF_MEMDIE on a task which has passed its + * exit_mm task->mm = NULL and exit_oom_victim otherwise it could + * theoretically keep its TIF_MEMDIE for ever while waiting for a parent + * to get it out of zombie state. MMU doesn't have this problem because + * it has the oom_reaper to clear the flag asynchronously. + */ + task_lock(tsk); + if (!tsk->mm) { + clear_tsk_thread_flag(tsk, TIF_MEMDIE); + task_unlock(tsk); + return; + } + taks_unlock(tsk); +#endif atomic_inc(&tsk->signal->oom_victims); /* * Make sure that the task is woken up from uninterruptible sleep -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>