On Wed 17-12-14 20:54:53, Tetsuo Handa wrote: [...] > I'm not familiar with memcg. This check doesn't make any sense for this path because the task is part of the memcg, otherwise it wouldn't trigger charge for it and couldn't cause the OOM killer. Kernel threads do not have their address space they cannot trigger memcg OOM killer. As you provide NULL nodemask then this is basically a check for task being part of the memcg. The check for current->mm is not needed as well because task will not trigger a charge after exit_mm. > But I think the condition whether TIF_MEMDIE > flag should be set or not should be same between the memcg OOM killer and > the global OOM killer, for a thread inside some memcg with TIF_MEMDIE flag > can prevent the global OOM killer from killing other threads when the memcg > OOM killer and the global OOM killer run concurrently (the worst corner case). > When a malicious user runs a memory consumer program which triggers memcg OOM > killer deadlock inside some memcg, it will result in the global OOM killer > deadlock when the global OOM killer is triggered by other user's tasks. Hope that the above exaplains your concerns here. > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > > index 481d550..01719d6 100644 > > > --- a/mm/oom_kill.c > > > +++ b/mm/oom_kill.c > > [...] > > > @@ -649,8 +649,14 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, > > > * If current has a pending SIGKILL or is exiting, then automatically > > > * select it. The goal is to allow it to allocate so that it may > > > * quickly exit and free its memory. > > > + * > > > + * However, if current is calling out_of_memory() by doing memory > > > + * allocation from e.g. exit_task_work() in do_exit() after PF_EXITING > > > + * was set by exit_signals() and mm was released by exit_mm(), it is > > > + * wrong to expect current to exit and free its memory quickly. > > > */ > > > - if (fatal_signal_pending(current) || task_will_free_mem(current)) { > > > + if ((fatal_signal_pending(current) || task_will_free_mem(current)) && > > > + current->mm && !oom_unkillable_task(current, NULL, nodemask)) { > > > set_thread_flag(TIF_MEMDIE); > > > return; > > > } > > > > Calling oom_unkillable_task doesn't make much sense to me. Even if it made > > sense it should be in a separate patch, no? > > At least for the global OOM case, current may be a kernel thread, doesn't it? then mm would be NULL most of the time so current->mm check wouldn't give it TIF_MEMDIE and the task itself will be exluded later on during tasks scanning. > Such kernel thread can do memory allocation from exit_task_work(), and trigger > the global OOM killer, and disable the global OOM killer and prevent other > threads from allocating memory, can't it? > > We can utilize memcg for reducing the possibility of triggering the global > OOM killer. I do not get this. Memcg charge happens after the allocation is done so the global OOM killer would trigger before memcg one. > But if we failed to prevent the global OOM killer from triggering, > the global OOM killer is responsible for solving the OOM condition than keeping > the system stalled for presumably forever. Panic on TIF_MEMDIE timeout can act > like /proc/sys/vm/panic_on_oom only when the OOM killer chose (by chance or > by a trap) an unkillable (due to e.g. lock dependency loop) task. Of course, > for those who prefer the system kept stalled over the OOM condition solved, > such action should be optional and thus I'm happy to propose sysctl-tunable > version. You are getting offtopic again (which is pretty annoying to be honest as it is going all over again and again). Please focus on a single thing at a time. > I think that > > if (!task->mm && test_tsk_thread_flag(task, TIF_MEMDIE)) > return true; > > check should be added to oom_unkillable_task() because mm-less thread can > release little memory (except invisible memory if any). Why do you think this makes more sense than handling this very special case in out_of_memory? I really do not see any reason to to make oom_unkillable_task more complicated. [...] -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>