Re: [PATCH] mm,oom: Set ->signal->oom_mm to all thread groupssharingvictim's memory.

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Sat, 6 Jan 2018 20:01:30 +0900

Michal Hocko wrote:
> On Sat 06-01-18 16:37:17, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Tue 19-12-17 20:26:14, Tetsuo Handa wrote:
> > > > When the OOM reaper set MMF_OOM_SKIP on the victim's mm before threads
> > > > sharing that mm get ->signal->oom_mm, the comment "That thread will now
> > > > get access to memory reserves since it has a pending fatal signal." no
> > > > longer stands. Also, since we introduced ALLOC_OOM watermark, the comment
> > > > "They don't get access to memory reserves, though, to avoid depletion of
> > > > all memory." no longer stands.
> > > > 
> > > > This patch treats all thread groups sharing the victim's mm evenly,
> > > > and updates the outdated comment.
> > > 
> > > Nack with a real life example where this matters.
> > 
> > You did not respond to
> > http://lkml.kernel.org/r/201712232341.FGC64072.VFLOOJOtFSFMHQ@xxxxxxxxxxxxxxxxxxx ,
> 
> Yes I haven't because there is simply no point continuing this
> discussion. You are simply immune to any arguments.
> 
> > and I observed needless OOM-killing. Therefore, I push this patch again.
> 
> Yes, the life is tough and oom heuristic might indeed kill more tasks
> for some workloads. But as long as those needless oom killing happens
> for artificial workloads I am not all that much interested.  Show me
> some workload that is actually real and we can make the current code
> more complicated. Without that my position remains.

That is a catch-22 requirement. A workload that is actually real would be
a case which failed to take mmap_sem for read. But we won't be there when
that happened in a real system which we cannot involve.

Anyway, short version is shown below.

>From f053ed1430e94b5c371a26b8c3903d27bcdcb0a0 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Date: Sat, 6 Jan 2018 19:41:20 +0900
Subject: [PATCH] mm, oom: task_will_free_mem should skip oom_victim tasks

Commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip
oom_reaped tasks") should check ->signal->oom_mm rather than
MMF_OOM_SKIP, for clone(CLONE_VM && !CLONE_SIGHAND) case causes premature
next OOM victim selection when the intention of that commit was to avoid
OOM lockup due to infinite retries.

Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Cc: Roman Gushchin <guro@xxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: Vladimir Davydov <vdavydov.dev@xxxxxxxxx>
Cc: David Rientjes <rientjes@xxxxxxxxxx>
Cc: Tejun Heo <tj@xxxxxxxxxx>
---
 mm/oom_kill.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8219001..9526ba8 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -804,11 +804,8 @@ static bool task_will_free_mem(struct task_struct *task)
 	if (!__task_will_free_mem(task))
 		return false;
 
-	/*
-	 * This task has already been drained by the oom reaper so there are
-	 * only small chances it will free some more
-	 */
-	if (test_bit(MMF_OOM_SKIP, &mm->flags))
+	/* Skip tasks which tried ALLOC_OOM but still cannot make progress. */
+	if (tsk_is_oom_victim(task))
 		return false;
 
 	if (atomic_read(&mm->mm_users) <= 1)
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>