Tetsuo Handa wrote: > If we can tolerate lack of process name and its pid when reporting > success/failure (or we pass them via mm_struct or walk the process list or > whatever else), I think we can do something like below patch (most revert of > "oom: clear TIF_MEMDIE after oom_reaper managed to unmap the address space"). > > if (attempts > MAX_OOM_REAP_RETRIES) { > - pr_info("oom_reaper: unable to reap pid:%d (%s)\n", > - task_pid_nr(tsk), tsk->comm); > + pr_info("oom_reaper: unable to reap memory\n"); > debug_show_all_locks(); > } > Since possible cause of unable to reap memory for oom_reap_vmas() is limited to Somebody was waiting at down_write(&mm->mmap_sem) (where converting to down_write_killable(&mm->mmap_sem) helps). or Somebody was waiting on unkillable lock between down_write(&mm->mmap_sem) and up_write(&mm->mmap_sem) (where we will need to convert such locks killable). or Somebody was doing !__GFP_FS && !__GFP_NOFAIL allocation between down_write(&mm->mmap_sem) and up_write(&mm->mmap_sem), and unable to call out_of_memory() in order to acquire TIF_MEMDIE (where setting TIF_MEMDIE to all threads using that mm by oom_kill_process() helps). or Somebody was doing __GFP_FS || __GFP_NOFAIL allocation between down_write(&mm->mmap_sem) and up_write(&mm->mmap_sem), but unable to call out_of_memory() for more than one second due to oom_lock contention and/or scheduling priority (where setting TIF_MEMDIE to all threads using that mm by oom_kill_process() helps). , we want to check traces of threads using that mm rather than locks held by all threads. In addition to that, CONFIG_PROVE_LOCKING is not enabled in most production systems. I think below patch is more helpful than debug_show_all_locks(). (Though kmallocwd patch will report "unable to reap mm" case and "unable to leave too_many_isolated() loop" case and any other not-yet-identified cases which stall memory allocation.) ---------- diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 2199c71..affbb79 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -502,8 +502,26 @@ static void oom_reap_vmas(struct mm_struct *mm) schedule_timeout_idle(HZ/10); if (attempts > MAX_OOM_REAP_RETRIES) { + struct task_struct *p; + struct task_struct *t; + pr_info("oom_reaper: unable to reap memory\n"); - debug_show_all_locks(); + rcu_read_lock(); + for_each_process_thread(p, t) { + if (likely(t->mm != mm)) + continue; + pr_info("oom_reaper: %s(%u) flags=0x%x%s%s%s%s\n", + t->comm, t->pid, t->flags, + (t->state & TASK_UNINTERRUPTIBLE) ? + " uninterruptible" : "", + (t->flags & PF_EXITING) ? " exiting" : "", + fatal_signal_pending(t) ? " dying" : "", + test_tsk_thread_flag(t, TIF_MEMDIE) ? + " victim" : ""); + sched_show_task(t); + debug_show_held_locks(t); + } + rcu_read_unlock(); } /* Drop a reference taken by wake_oom_reaper */ ---------- Well, I think we can define CONFIG_OOM_REAPER which defaults to y and depends on CONFIG_MMU, rather than scatter around CONFIG_MMU. That will help catching build failure on CONFIG_MMU=n case... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>