Re: [PATCH v2] memcg, oom: check memcg margin for parallel oom

Chris Down <chris@xxxxxxxxxxxxxx> · Tue, 14 Jul 2020 15:30:18 +0100

Yafang Shao writes:
Memcg oom killer invocation is synchronized by the global oom_lock and
tasks are sleeping on the lock while somebody is selecting the victim or
potentially race with the oom_reaper is releasing the victim's memory.
This can result in a pointless oom killer invocation because a waiter
might be racing with the oom_reaper

       P1              oom_reaper              P2
                       oom_reap_task           mutex_lock(oom_lock)
                                               out_of_memory # no victim because we have one already
                       __oom_reap_task_mm      mute_unlock(oom_lock)
mutex_lock(oom_lock)
                       set MMF_OOM_SKIP
select_bad_process
# finds a new victim

The page allocator prevents from this race by trying to allocate after
the lock can be acquired (in __alloc_pages_may_oom) which acts as a last
minute check. Moreover page allocator simply doesn't block on the
oom_lock and simply retries the whole reclaim process.

Memcg oom killer should do the last minute check as well. Call
mem_cgroup_margin to do that. Trylock on the oom_lock could be done as
well but this doesn't seem to be necessary at this stage.

[mhocko@xxxxxxxxxx: commit log]
Suggested-by: Michal Hocko <mhocko@xxxxxxxxxx>
Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxxxx>
Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Cc: David Rientjes <rientjes@xxxxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>

Good catch, thanks.

Acked-by: Chris Down <chris@xxxxxxxxxxxxxx>