Re: [PATCH 6/5] oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Michal Hocko wrote:
> > Moreover, why don't you do like
> > 
> >   struct mm_struct {
> >   	(...snipped...)
> >   	struct list_head oom_reaper_list;
> >   	(...snipped...)
> >   }
> 
> Because we would need to search all tasks sharing the same mm in order
> to exit_oom_victim.

We should clear TIF_MEMDIE eventually in case reaping this mm was not sufficient.
But I'm not sure whether clearing TIF_MEMDIE immediately after reaping completed
is preferable. Clearing TIF_MEMDIE some time later might be better.

For example,

	mutex_lock(&oom_lock);
	exit_oom_victim(tsk);
	mutex_unlock(&oom_lock);

will avoid race between ongoing select_bad_process() from out_of_memory()
and oom_reap_task().

For example,

	mutex_lock(&oom_lock);
	msleep(100);
	exit_oom_victim(tsk);
	mutex_unlock(&oom_lock);

would make more unlikely to OOM-kill next victim by giving some time to
allow get_page_from_freelist() to succeed.

> 
> > than
> > 
> >   struct task_struct {
> >   	(...snipped...)
> >   	struct list_head oom_reaper_list;
> >   	(...snipped...)
> >   }
> > 
> > so that we can update all ->oom_score_adj using this mm_struct for handling
> > crazy combo ( http://lkml.kernel.org/r/20160204163113.GF14425@xxxxxxxxxxxxxx ) ?
> 
> I find it much easier to to simply skip over tasks with MMF_OOM_KILLED
> when already selecting a victim. We won't need oom_score_adj games at
> all. This needs a deeper evaluation though. I didn't get to it yet,
> but the point of having MMF flag which is not oom_reaper specific
> was to have it reusable in other contexts as well.

I think below patch can handle crazy combo without using MMF flag.

----------
>From 47c4bba988eae2f761894c679e6f57668d4cacd3 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Date: Mon, 29 Feb 2016 09:57:30 +0900
Subject: [PATCH 1/2] mm,oom: Mark OOM-killed processes as no longer
 OOM-unkillable.

Since OOM_SCORE_ADJ_MIN means that the OOM killer must not send SIGKILL
on that process, changing oom_score_adj to OOM_SCORE_ADJ_MIN as soon as
SIGKILL was sent on that process should be OK. This will also solve a
problem that SysRq-f chooses the same process forever.

Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
---
 mm/oom_kill.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 5d5eca9..f5dddc3 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -497,7 +497,6 @@ static bool __oom_reap_task(struct task_struct *tsk)
 	 * improvements. This also means that selecting this task doesn't
 	 * make any sense.
 	 */
-	tsk->signal->oom_score_adj = OOM_SCORE_ADJ_MIN;
 	exit_oom_victim(tsk);
 out:
 	mmput(mm);
@@ -775,8 +774,6 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 	for_each_process(p) {
 		if (!process_shares_mm(p, mm))
 			continue;
-		if (same_thread_group(p, victim))
-			continue;
 		if (unlikely(p->flags & PF_KTHREAD) || is_global_init(p) ||
 		    p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) {
 			/*
@@ -788,6 +785,9 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 			continue;
 		}
 		do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true);
+		task_lock(p);
+		p->signal->oom_score_adj = OOM_SCORE_ADJ_MIN;
+		task_unlock(p);
 	}
 	rcu_read_unlock();
 
----------

Regarding oom_kill_allocating_task = 1 case, we can do below if we want to
OOM-kill current before trying to OOM-kill children of current.

----------
>From b0f38de7f07328bca65c5d4b0361d5f73283a6e4 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Date: Mon, 29 Feb 2016 10:03:13 +0900
Subject: [PATCH 2/2] mm,oom: Don't try to OOM-kill children if
 oom_kill_allocating_task = 1.

Trying to OOM-kill children of current can cause killing more than
needed if OOM-killed children can not release memory immediately
because current does not wait for OOM-killed children to release memory.
oom_kill_allocating_task = 1 should try to OOM-kill current first.

Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
---
 mm/oom_kill.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index f5dddc3..9098d48 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -682,6 +682,13 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 	static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
 					      DEFAULT_RATELIMIT_BURST);
 	bool can_oom_reap;
+	const bool kill_current = !p;
+	
+	if (kill_current) {
+		p = current;
+		victim = p;
+		get_task_struct(p);
+	}
 
 	/*
 	 * If the task is already exiting, don't alarm the sysadmin or kill
@@ -702,6 +709,8 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 	pr_err("%s: Kill process %d (%s) score %u or sacrifice child\n",
 		message, task_pid_nr(p), p->comm, points);
 
+	if (kill_current)
+		goto kill;
 	/*
 	 * If any of p's children has a different mm and is eligible for kill,
 	 * the one with the highest oom_badness() score is sacrificed for its
@@ -730,6 +739,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 	}
 	read_unlock(&tasklist_lock);
 
+kill:
 	p = find_lock_task_mm(victim);
 	if (!p) {
 		put_task_struct(victim);
@@ -889,8 +899,7 @@ bool out_of_memory(struct oom_control *oc)
 	if (sysctl_oom_kill_allocating_task && current->mm &&
 	    !oom_unkillable_task(current, NULL, oc->nodemask) &&
 	    current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) {
-		get_task_struct(current);
-		oom_kill_process(oc, current, 0, totalpages, NULL,
+		oom_kill_process(oc, NULL, 0, totalpages, NULL,
 				 "Out of memory (oom_kill_allocating_task)");
 		return true;
 	}
----------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]