The previous version of this RFC has been posted here [1]. I have fixed few issues spotted during the review and by 0day bot. I have also reworked patch 2 to be ratio rather than an absolute number based. With this series applied the locking protocol between the oom_reaper and the exit path is as follows. All parts which cannot race should use the exclusive lock on the exit path. Once the exit path has passed the moment when no blocking locks are taken then it clears mm->mmap under the exclusive lock. oom_reaper checks for this and sets MMF_OOM_SKIP only if the exit path is not guaranteed to finish the job. This is patch 3 so see the changelog for all the details. I would really appreciate if David could give this a try and see how this behaves in workloads where the oom_reaper falls flat now. I have been playing with sparsely allocated memory with a high pte/real memory ratio and large mlocked processes and it worked reasonably well. There is still some room for tuning here of course. We can change the number of retries for the oom_reaper as well as the threshold when the keep retrying. Michal Hocko (3): mm, oom: rework mmap_exit vs. oom_reaper synchronization mm, oom: keep retrying the oom_reap operation as long as there is substantial memory left mm, oom: hand over MMF_OOM_SKIP to exit path if it is guranteed to finish Diffstat: include/linux/oom.h | 2 -- mm/internal.h | 3 +++ mm/memory.c | 28 ++++++++++++++-------- mm/mmap.c | 69 +++++++++++++++++++++++++++++++++-------------------- mm/oom_kill.c | 45 ++++++++++++++++++++++++---------- 5 files changed, 97 insertions(+), 50 deletions(-) [1] http://lkml.kernel.org/r/20180910125513.311-1-mhocko@xxxxxxxxxx