On Wed, Oct 31, 2012 at 9:48 PM, David Rientjes <rientjes@xxxxxxxxxx> wrote: > On Thu, 1 Nov 2012, Minchan Kim wrote: > >> It's not true any more. >> 3.6 includes following code in try_to_free_pages >> >> /* >> * Do not enter reclaim if fatal signal is pending. 1 is returned so >> * that the page allocator does not consider triggering OOM >> */ >> if (fatal_signal_pending(current)) >> return 1; >> >> So the hunged task never go to the OOM path and could be looping forever. >> > > Ah, interesting. This is from commit 5515061d22f0 ("mm: throttle direct > reclaimers if PF_MEMALLOC reserves are low and swap is backed by network > storage"). Thanks for adding Mel to the cc. > > The oom killer specifically has logic for this condition: when calling > out_of_memory() the first thing it does is > > if (fatal_signal_pending(current)) > set_thread_flag(TIF_MEMDIE); > > to allow it access to memory reserves so that it may exit if it's having > trouble. But that ends up never happening because of the above code that > Minchan has identified. > > So we either need to do set_thread_flag(TIF_MEMDIE) in try_to_free_pages() > as well or revert that early return entirely; there's no justification > given for it in the comment nor in the commit log. I'd rather remove it > and allow the oom killer to trigger and grant access to memory reserves > itself if necessary. > > Mel, how does commit 5515061d22f0 deal with threads looping forever if > they need memory in the exit path since the oom killer never gets called? > > That aside, it doesn't seem like this is the issue that Luigi is reporting > since his patch that avoids deferring the oom killer presumably fixes the > issue for him. So it turns out the oom killer must be getting called. > > Luigi, can you try this instead? It applies to the latest git but should > be easily modified to apply to any 3.x kernel you're running. > --- > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -310,26 +310,13 @@ enum oom_scan_t oom_scan_process_thread(struct task_struct *task, > if (!task->mm) > return OOM_SCAN_CONTINUE; > > - if (task->flags & PF_EXITING) { > + if (task->flags & PF_EXITING && !force_kill) { > /* > - * If task is current and is in the process of releasing memory, > - * allow the "kill" to set TIF_MEMDIE, which will allow it to > - * access memory reserves. Otherwise, it may stall forever. > - * > - * The iteration isn't broken here, however, in case other > - * threads are found to have already been oom killed. > + * If this task is not being ptraced on exit, then wait for it > + * to finish before killing some other task unnecessarily. > */ > - if (task == current) > - return OOM_SCAN_SELECT; > - else if (!force_kill) { > - /* > - * If this task is not being ptraced on exit, then wait > - * for it to finish before killing some other task > - * unnecessarily. > - */ > - if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) > - return OOM_SCAN_ABORT; > - } > + if (!(task->group_leader->ptrace & PT_TRACE_EXIT)) > + return OOM_SCAN_ABORT; > } > return OOM_SCAN_OK; > } > @@ -706,11 +693,11 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, > return; > > /* > - * If current has a pending SIGKILL, then automatically select it. The > - * goal is to allow it to allocate so that it may quickly exit and free > - * its memory. > + * If current has a pending SIGKILL or is exiting, then automatically > + * select it. The goal is to allow it to allocate so that it may > + * quickly exit and free its memory. > */ > - if (fatal_signal_pending(current)) { > + if (fatal_signal_pending(current) || current->flags & PF_EXITING) { > set_thread_flag(TIF_MEMDIE); > return; > } I tested this change with my load and it appears to also prevent the deadlocks. I have a question though. I thought only one process was allowed to be in TIF_MEMDIE state, but I don't see anything that prevents this code (before or after the change) from setting the flag in multiple processes. Is this a problem? Thanks! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>