Re: [PATCH v2] memcg, oom: check memcg margin for parallel oom

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Thu, 16 Jul 2020 14:54:01 +0900

On 2020/07/16 2:30, David Rientjes wrote:
> But regardless of whether we present previous data to the user in the 
> kernel log or not, we've determined that oom killing a process is a 
> serious matter and go to any lengths possible to avoid having to do it.  
> For us, that means waiting until the "point of no return" to either go 
> ahead with oom killing a process or aborting and retrying the charge.
> 
> I don't think moving the mem_cgroup_margin() check to out_of_memory() 
> right before printing the oom info and killing the process is a very 
> invasive patch.  Any strong preference against doing it that way?  I think 
> moving the check as late as possible to save a process from being killed 
> when racing with an exiter or killed process (including perhaps current) 
> has a pretty clear motivation.
> 

How about ignoring MMF_OOM_SKIP for once? I think this has almost same
effect as moving the mem_cgroup_margin() check to out_of_memory() 
right before printing the oom info and killing the process.

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 48e0db54d838..88170af3b9eb 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -322,7 +322,8 @@ static int oom_evaluate_task(struct task_struct *task, void *arg)
 	 * any memory is quite low.
 	 */
 	if (!is_sysrq_oom(oc) && tsk_is_oom_victim(task)) {
-		if (test_bit(MMF_OOM_SKIP, &task->signal->oom_mm->flags))
+		if (test_bit(MMF_OOM_SKIP, &task->signal->oom_mm->flags) &&
+		    !test_and_clear_bit(MMF_OOM_REAP_QUEUED, &task->signal->oom_mm->flags))
 			goto next;
 		goto abort;
 	}
@@ -658,7 +659,8 @@ static int oom_reaper(void *unused)
 static void wake_oom_reaper(struct task_struct *tsk)
 {
 	/* mm is already queued? */
-	if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags))
+	if (test_bit(MMF_OOM_SKIP, &tsk->signal->oom_mm->flags) ||
+	    test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags))
 		return;
 
 	get_task_struct(tsk);