On Tue 14-07-20 08:08:32, Yafang Shao wrote: > The commit 7775face2079 ("memcg: killed threads should not invoke memcg OOM > killer") resolves the problem that different threads in a multi-threaded > task doing parallel memcg oom, but it doesn't solve the problem that > different tasks doing parallel memcg oom. > > It may happens that many different tasks in the same memcg are waiting > oom_lock at the same time, if one of them has already made progress and > freed enough available memory, the others don't need to trigger the oom > killer again. By checking memcg margin after hold oom_lock can help > achieve it. While the changelog makes sense it I believe it can be improved. I would use something like the following. Feel free to use its parts. " Memcg oom killer invocation is synchronized by the global oom_lock and tasks are sleeping on the lock while somebody is selecting the victim or potentially race with the oom_reaper is releasing the victim's memory. This can result in a pointless oom killer invocation because a waiter might be racing with the oom_reaper P1 oom_reaper P2 oom_reap_task mutex_lock(oom_lock) out_of_memory # no victim because we have one already __oom_reap_task_mm mute_unlock(oom_lock) mutex_lock(oom_lock) set MMF_OOM_SKIP select_bad_process # finds a new victim The page allocator prevents from this race by trying to allocate after the lock can be acquired (in __alloc_pages_may_oom) which acts as a last minute check. Moreover page allocator simply doesn't block on the oom_lock and simply retries the whole reclaim process. Memcg oom killer should do the last minute check as well. Call mem_cgroup_margin to do that. Trylock on the oom_lock could be done as well but this doesn't seem to be necessary at this stage. " > Suggested-by: Michal Hocko <mhocko@xxxxxxxxxx> > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxxxx> > Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> > Cc: David Rientjes <rientjes@xxxxxxxxxx> > --- > mm/memcontrol.c | 19 +++++++++++++++++-- > 1 file changed, 17 insertions(+), 2 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 1962232..df141e1 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1560,16 +1560,31 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > .gfp_mask = gfp_mask, > .order = order, > }; > - bool ret; > + bool ret = true; > > if (mutex_lock_killable(&oom_lock)) > return true; > + > /* > * A few threads which were not waiting at mutex_lock_killable() can > * fail to bail out. Therefore, check again after holding oom_lock. > */ > - ret = should_force_charge() || out_of_memory(&oc); > + if (should_force_charge()) > + goto out; > + > + /* > + * Different tasks may be doing parallel oom, so after hold the > + * oom_lock the task should check the memcg margin again to check > + * whether other task has already made progress. > + */ > + if (mem_cgroup_margin(memcg) >= (1 << order)) > + goto out; Is there any reason why you simply haven't done this? (+ your comment which is helpful). diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 248e6cad0095..2c176825efe3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1561,15 +1561,21 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, .order = order, .chosen_points = LONG_MIN, }; - bool ret; + bool ret = true; - if (mutex_lock_killable(&oom_lock)) + if (!mutex_trylock(&oom_lock)) return true; + + if (mem_cgroup_margin(memcg) >= (1 << order)) + goto unlock; + /* * A few threads which were not waiting at mutex_lock_killable() can * fail to bail out. Therefore, check again after holding oom_lock. */ ret = should_force_charge() || out_of_memory(&oc); + +unlock: mutex_unlock(&oom_lock); return ret; } > + > + ret = out_of_memory(&oc); > + > +out: > mutex_unlock(&oom_lock); > + > return ret; > } > > -- > 1.8.3.1 -- Michal Hocko SUSE Labs