[PATCH v2] mm,oom: Allow SysRq-f to always select !TIF_MEMDIE thread group.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



There has been three problems about SysRq-f (manual invocation of the OOM
killer) case. To make description simple, this patch assumes situation
where the OOM reaper is not called (because the OOM victim's mm is shared
by unkillable threads) or not available (due to kthread_run() failure or
CONFIG_MMU=n).

First is that moom_callback() is not called by moom_work under OOM
livelock situation because it does not have a dedicated WQ like vmstat_wq.
This problem is not fixed yet.

Second is that select_bad_process() chooses a thread group which already
has a TIF_MEMDIE thread. Since commit f44666b04605d1c7 ("mm,oom: speed up
select_bad_process() loop") changed oom_scan_process_group() to use
task->signal->oom_victims, non SysRq-f case will no longer select a
thread group which already has a TIF_MEMDIE thread. But SysRq-f case will
select such thread group due to returning OOM_SCAN_OK. This patch makes
sure that oom_badness() is skipped by making oom_scan_process_group() to
return OOM_SCAN_CONTINUE for SysRq-f case.

Third is that oom_kill_process() chooses a thread group which already
has a TIF_MEMDIE thread when the candidate select_bad_process() chose
has children because oom_badness() does not take TIF_MEMDIE into account.
This patch checks child->signal->oom_victims before calling oom_badness()
if oom_kill_process() was called by SysRq-f case. This resembles making
sure that oom_badness() is skipped by returning OOM_SCAN_CONTINUE.

If we don't limit child->signal->oom_victims check to SysRq-f case, we
will break sysctl_oom_kill_allocating_task case by immediately killing
all children of the candidate when killing some child did not immediately
solve the OOM situation because oom_scan_process_thread() is not called.
This will be something we need to mark such child as unkillable after
some reasonable period or make sysctl_oom_kill_allocating_task literally
kill allocating task. Anyway, this patch addresses only SysRq-f case.

Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxx>
Cc: David Rientjes <rientjes@xxxxxxxxxx>
---
 mm/oom_kill.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1685890..159063e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -283,8 +283,8 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
 	 * This task already has access to memory reserves and is being killed.
 	 * Don't allow any other task to have access to the reserves.
 	 */
-	if (!is_sysrq_oom(oc) && atomic_read(&task->signal->oom_victims))
-		return OOM_SCAN_ABORT;
+	if (atomic_read(&task->signal->oom_victims))
+		return !is_sysrq_oom(oc) ? OOM_SCAN_ABORT : OOM_SCAN_CONTINUE;
 
 	/*
 	 * If task is allocating a lot of memory and has been marked to be
@@ -793,6 +793,14 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 			if (process_shares_mm(child, p->mm))
 				continue;
 			/*
+			 * Don't select TIF_MEMDIE child by SysRq-f case, or
+			 * we will get stuck by selecting the same TIF_MEMDIE
+			 * child forever.
+			 */
+			if (is_sysrq_oom(oc) &&
+			    atomic_read(&child->signal->oom_victims))
+				continue;
+			/*
 			 * oom_badness() returns 0 if the thread is unkillable
 			 */
 			child_points = oom_badness(child, memcg, oc->nodemask,
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]