Re: [RFC PATCH] mm, oom: introduce oom reaper

Michal Hocko <mhocko@xxxxxxxxxx> · Thu, 26 Nov 2015 18:31:00 +0100

On Thu 26-11-15 17:34:56, Michal Hocko wrote:
> On Fri 27-11-15 00:24:43, Tetsuo Handa wrote:
> > Michal Hocko wrote:
[...]
> > > +	tlb_gather_mmu(&tlb, mm, 0, -1);
> > > +	for (vma = mm->mmap ; vma; vma = vma->vm_next) {
> > > +		if (is_vm_hugetlb_page(vma))
> > > +			continue;
> > > +
> > > +		/*
> > > +		 * Only anonymous pages have a good chance to be dropped
> > > +		 * without additional steps which we cannot afford as we
> > > +		 * are OOM already.
> > > +		 */
> > > +		if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED))
> > > +			unmap_page_range(&tlb, vma, vma->vm_start, vma->vm_end,
> > > +					 &details);
> > 
> > How do you plan to make sure that reclaimed pages are used by
> > fatal_signal_pending() tasks?
> > http://lkml.kernel.org/r/201509242050.EHE95837.FVFOOtMQHLJOFS@xxxxxxxxxxxxxxxxxxx
> > http://lkml.kernel.org/r/201510121543.EJF21858.LtJFHOOOSQVMFF@xxxxxxxxxxxxxxxxxxx
> 
> Well the wake_oom_reaper is responsible to hand over mm of the OOM
> victim and as such it should be a killed process.  I guess you mean that
> the mm might be shared with another process which is hidden from the OOM
> killer, right? Well I think this is not something to care about at this
> layer. We shouldn't select a tasks which can lead to this situation in
> the first place. Such an oom victim is basically selected incorrectly. I
> think we can handle that by a flag in mm_struct.
> 
> I guess we have never cared too much about this case because it sounds
> like a misconfiguration. If you want to shoot your own head the do it.
> It is true that this patch will make such a case more prominent because
> we can cause a side effect now. I can cook up a patch to help to sort
> this out.
> 
> Thanks for pointing this out.

OK, so I was thinking about that some more and came to the conclusion
that we cannot use per mm struct flag. This would be basically
equivalent to moving oom_score_adj to mm_struct which has shown to be a
problem in the past (especially for vfork(); set_oom_score; execve()
loads). So I've ended up with the following. It would mean we will not
use the reaper in some cases but those should be marginal and some of
them even dubious at best.
---

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 043c0fe146a5..bceeebe96a1b 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -646,6 +646,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 	unsigned int victim_points = 0;
 	static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
 					      DEFAULT_RATELIMIT_BURST);
+	bool can_oom_reap = true;
 
 	/*
 	 * If the task is already exiting, don't alarm the sysadmin or kill
@@ -736,15 +737,22 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 			continue;
 		if (same_thread_group(p, victim))
 			continue;
-		if (unlikely(p->flags & PF_KTHREAD))
-			continue;
-		if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN)
+		if (unlikely(p->flags & PF_KTHREAD) ||
+		    p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) {
+			/*
+			 * We cannot usee oom_reaper for the mm shared by this process
+			 * because it wouldn't get killed and so the memory might be
+			 * still used.
+			 */
+			can_oom_reap = false;
 			continue;
+		}
 
 		do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true);
 	}
 	rcu_read_unlock();
-	wake_oom_reaper(mm);
+	if (can_oom_reap)
+		wake_oom_reaper(mm);
 	mmdrop(mm);
 	put_task_struct(victim);
 }
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>