> What has happened to the oom victim and why it has never exited? What appears to happen is that the oom victim is sent SIGKILL by the process that triggers the oom while also being marked as an oom victim. As you mention in your patchset introducing the oom reaper in commit aac4536355496 ("mm, oom: introduce oom reaper"), the purpose the the oom reaper is to try and free more memory more quickly than it otherwise would have been by assuming anonymous or swapped out pages won't be needed in the exit path as the owner is already dying. However, this assumption is violated by the futex_cleanup() path, which needs access to userspace in fetch_robust_entry() when it is called in exit_robust_list(). Trace_printk()s in this failure path reveal an apparent race between the oom reaper thread reaping the victim's mm and the futex_cleanup() path. There may be other ways that this race manifests but we have been most consistently able to trace that one. Since in the case of an oom victim using robust futexes the core assumption of the oom reaper is violated, we propose to solve this problem by either canceling or delaying the waking of the oom reaper thread by wake_oom_reaper in the case that tsk->robust_list is non-NULL. e.g. the bug does not reproduce with this patch (from npache@xxxxxxxxxx): diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 989f35a2bbb1..b8c518fdcf4d 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -665,6 +665,19 @@ static void wake_oom_reaper(struct task_struct *tsk) if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags)) return; +#ifdef CONFIG_FUTEX + /* + * don't wake the oom_reaper thread if we still have a robust list to handle + * This will then rely on the sigkill to handle the cleanup of memory + */ + if(tsk->robust_list) + return; +#ifdef CONFIG_COMPAT + if(tsk->compat_robust_list) + return; +#endif +#endif + get_task_struct(tsk); spin_lock(&oom_reaper_lock); Best, Joel Savitz