I have only noticed your email now after replying to v3 so our emails have crossed. On Fri 14-01-22 09:39:55, Joel Savitz wrote: > > What has happened to the oom victim and why it has never exited? > > What appears to happen is that the oom victim is sent SIGKILL by the > process that triggers the oom while also being marked as an oom > victim. > > As you mention in your patchset introducing the oom reaper in commit > aac4536355496 ("mm, oom: introduce oom reaper"), the purpose the the > oom reaper is to try and free more memory more quickly than it > otherwise would have been by assuming anonymous or swapped out pages > won't be needed in the exit path as the owner is already dying. > However, this assumption is violated by the futex_cleanup() path, > which needs access to userspace in fetch_robust_entry() when it is > called in exit_robust_list(). Trace_printk()s in this failure path > reveal an apparent race between the oom reaper thread reaping the > victim's mm and the futex_cleanup() path. There may be other ways that > this race manifests but we have been most consistently able to trace > that one. Please let's continue the discussion in the v3 email thread: http://lkml.kernel.org/r/20220114180135.83308-1-npache@xxxxxxxxxx -- Michal Hocko SUSE Labs