On Thu, 3 Aug 2017 08:55:04 +0900 Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote: > Manish Jaggi noticed that running LTP oom01/oom02 ltp tests with high core > count causes random kernel panics when an OOM victim which consumed memory > in a way the OOM reaper does not help was selected by the OOM killer. > > ... > > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -652,6 +652,7 @@ struct task_struct { > /* disallow userland-initiated cgroup migration */ > unsigned no_cgroup_migration:1; > #endif > + unsigned oom_kill_free_check_raced:1; > > unsigned long atomic_flags; /* Flags requiring atomic access. */ > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 9e8b4f0..a1ae78d 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -780,11 +780,19 @@ static bool task_will_free_mem(struct task_struct *task) > return false; > > /* > - * This task has already been drained by the oom reaper so there are > - * only small chances it will free some more > + * It is possible that current thread fails to try allocation from > + * memory reserves if the OOM reaper set MMF_OOM_SKIP on this mm before > + * current thread calls out_of_memory() in order to get TIF_MEMDIE. > + * In that case, allow current thread to try TIF_MEMDIE allocation > + * before start selecting next OOM victims. > */ > - if (test_bit(MMF_OOM_SKIP, &mm->flags)) > + if (test_bit(MMF_OOM_SKIP, &mm->flags)) { > + if (task == current && !task->oom_kill_free_check_raced) { > + task->oom_kill_free_check_raced = true; OK, caller's task_lock() prevents races here. nit: task->oom_kill_free_check_raced is `unsigned', so " = 1" would be more truthful here... > + return true; > + } > return false; > + } > > if (atomic_read(&mm->mm_users) <= 1) > return true; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>