Memcg oom killer invocation is synchronized by the global oom_lock and tasks are sleeping on the lock while somebody is selecting the victim or potentially race with the oom_reaper is releasing the victim's memory. This can result in a pointless oom killer invocation because a waiter might be racing with the oom_reaper P1 oom_reaper P2 oom_reap_task mutex_lock(oom_lock) out_of_memory # no victim because we have one already __oom_reap_task_mm mute_unlock(oom_lock) mutex_lock(oom_lock) set MMF_OOM_SKIP select_bad_process # finds a new victim The page allocator prevents from this race by trying to allocate after the lock can be acquired (in __alloc_pages_may_oom) which acts as a last minute check. Moreover page allocator simply doesn't block on the oom_lock and simply retries the whole reclaim process. Memcg oom killer should do the last minute check as well. Call mem_cgroup_margin to do that. Trylock on the oom_lock could be done as well but this doesn't seem to be necessary at this stage. [mhocko@xxxxxxxxxx: commit log] Suggested-by: Michal Hocko <mhocko@xxxxxxxxxx> Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxxxx> Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> Cc: David Rientjes <rientjes@xxxxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> --- v1 -> v2: - commit log improved by Michal - retitle the subject from "mm, oom: check memcg margin for parallel oom" - code simplicity, per Michal --- mm/memcontrol.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1962232..15e0e18 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1560,15 +1560,21 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, .gfp_mask = gfp_mask, .order = order, }; - bool ret; + bool ret = true; if (mutex_lock_killable(&oom_lock)) return true; + + if (mem_cgroup_margin(memcg) >= (1 << order)) + goto unlock; + /* * A few threads which were not waiting at mutex_lock_killable() can * fail to bail out. Therefore, check again after holding oom_lock. */ ret = should_force_charge() || out_of_memory(&oc); + +unlock: mutex_unlock(&oom_lock); return ret; } -- 1.8.3.1