On 2019/01/26 4:47, Arkadiusz Miśkiewicz wrote: >> Can you please see whether the problem can be reproduced on the >> current linux-next? >> >> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > > I can reproduce on next (5.0.0-rc3-next-20190125), too: > Please try this patch. Subject: [PATCH v2] memcg: killed threads should not invoke memcg OOM killer From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> To: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Johannes Weiner <hannes@xxxxxxxxxxx>, David Rientjes <rientjes@xxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxxxx>, linux-mm@xxxxxxxxx, Kirill Tkhai <ktkhai@xxxxxxxxxxxxx>, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Message-ID: <01370f70-e1f6-ebe4-b95e-0df21a0bc15e@xxxxxxxxxxxxxxxxxxx> Date: Tue, 15 Jan 2019 19:17:27 +0900 If $N > $M, a single process with $N threads in a memcg group can easily kill all $M processes in that memcg group, for mem_cgroup_out_of_memory() does not check if current thread needs to invoke the memcg OOM killer. T1@P1 |T2...$N@P1|P2...$M |OOM reaper ----------+----------+----------+---------- # all sleeping try_charge() mem_cgroup_out_of_memory() mutex_lock(oom_lock) try_charge() mem_cgroup_out_of_memory() mutex_lock(oom_lock) out_of_memory() select_bad_process() oom_kill_process(P1) wake_oom_reaper() oom_reap_task() # ignores P1 mutex_unlock(oom_lock) out_of_memory() select_bad_process(P2...$M) # all killed by T2...$N@P1 wake_oom_reaper() oom_reap_task() # ignores P2...$M mutex_unlock(oom_lock) We don't need to invoke the memcg OOM killer if current thread was killed when waiting for oom_lock, for mem_cgroup_oom_synchronize(true) can count on try_charge() when mem_cgroup_oom_synchronize(true) can not make forward progress because try_charge() allows already killed/exiting threads to make forward progress, and memory_max_write() can bail out upon signals. At first Michal thought that fatal signal check is racy compared to tsk_is_oom_victim() check. But an experiment showed that trying to call mark_oom_victim() on all killed thread groups is more racy than fatal signal check due to task_will_free_mem(current) path in out_of_memory(). Therefore, this patch changes mem_cgroup_out_of_memory() to bail out upon should_force_charge() == T rather than upon fatal_signal_pending() == T, for should_force_charge() == T && signal_pending(current) == F at memory_max_write() can't happen because current thread won't call memory_max_write() after getting PF_EXITING. Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> Acked-by: Michal Hocko <mhocko@xxxxxxxx> Fixes: 29ef680ae7c2 ("memcg, oom: move out_of_memory back to the charge path") Fixes: 3100dab2aa09 ("mm: memcontrol: print proper OOM header when no eligible victim left") Cc: stable@xxxxxxxxxxxxxxx # 4.19+ --- mm/memcontrol.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index af7f18b..79a7d2a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -248,6 +248,12 @@ enum res_type { iter != NULL; \ iter = mem_cgroup_iter(NULL, iter, NULL)) +static inline bool should_force_charge(void) +{ + return tsk_is_oom_victim(current) || fatal_signal_pending(current) || + (current->flags & PF_EXITING); +} + /* Some nice accessors for the vmpressure. */ struct vmpressure *memcg_to_vmpressure(struct mem_cgroup *memcg) { @@ -1389,8 +1395,13 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, }; bool ret; - mutex_lock(&oom_lock); - ret = out_of_memory(&oc); + if (mutex_lock_killable(&oom_lock)) + return true; + /* + * A few threads which were not waiting at mutex_lock_killable() can + * fail to bail out. Therefore, check again after holding oom_lock. + */ + ret = should_force_charge() || out_of_memory(&oc); mutex_unlock(&oom_lock); return ret; } @@ -2209,9 +2220,7 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, * bypass the last charges so that they can exit quickly and * free their memory. */ - if (unlikely(tsk_is_oom_victim(current) || - fatal_signal_pending(current) || - current->flags & PF_EXITING)) + if (unlikely(should_force_charge())) goto force; /* -- 1.8.3.1