From: Yafang Shao <laoar.shao@xxxxxxxxx> Subject: memcg, oom: check memcg margin for parallel oom Memcg oom killer invocation is synchronized by the global oom_lock and tasks are sleeping on the lock while somebody is selecting the victim or potentially race with the oom_reaper is releasing the victim's memory. This can result in a pointless oom killer invocation because a waiter might be racing with the oom_reaper P1 oom_reaper P2 oom_reap_task mutex_lock(oom_lock) out_of_memory # no victim because we have one already __oom_reap_task_mm mute_unlock(oom_lock) mutex_lock(oom_lock) set MMF_OOM_SKIP select_bad_process # finds a new victim The page allocator prevents from this race by trying to allocate after the lock can be acquired (in __alloc_pages_may_oom) which acts as a last minute check. Moreover page allocator simply doesn't block on the oom_lock and simply retries the whole reclaim process. Memcg oom killer should do the last minute check as well. Call mem_cgroup_margin to do that. Trylock on the oom_lock could be done as well but this doesn't seem to be necessary at this stage. [mhocko@xxxxxxxxxx: commit log] Link: http://lkml.kernel.org/r/1594735034-19190-1-git-send-email-laoar.shao@xxxxxxxxx Suggested-by: Michal Hocko <mhocko@xxxxxxxxxx> Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> Acked-by: Michal Hocko <mhocko@xxxxxxxx> Acked-by: Chris Down <chris@xxxxxxxxxxxxxx> Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> Cc: David Rientjes <rientjes@xxxxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/memcontrol.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) --- a/mm/memcontrol.c~memcg-oom-check-memcg-margin-for-parallel-oom +++ a/mm/memcontrol.c @@ -1663,15 +1663,21 @@ static bool mem_cgroup_out_of_memory(str .gfp_mask = gfp_mask, .order = order, }; - bool ret; + bool ret = true; if (mutex_lock_killable(&oom_lock)) return true; + + if (mem_cgroup_margin(memcg) >= (1 << order)) + goto unlock; + /* * A few threads which were not waiting at mutex_lock_killable() can * fail to bail out. Therefore, check again after holding oom_lock. */ ret = should_force_charge() || out_of_memory(&oc); + +unlock: mutex_unlock(&oom_lock); return ret; } _