On Thu 26-11-15 17:34:56, Michal Hocko wrote: > On Fri 27-11-15 00:24:43, Tetsuo Handa wrote: > > Michal Hocko wrote: [...] > > > + tlb_gather_mmu(&tlb, mm, 0, -1); > > > + for (vma = mm->mmap ; vma; vma = vma->vm_next) { > > > + if (is_vm_hugetlb_page(vma)) > > > + continue; > > > + > > > + /* > > > + * Only anonymous pages have a good chance to be dropped > > > + * without additional steps which we cannot afford as we > > > + * are OOM already. > > > + */ > > > + if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) > > > + unmap_page_range(&tlb, vma, vma->vm_start, vma->vm_end, > > > + &details); > > > > How do you plan to make sure that reclaimed pages are used by > > fatal_signal_pending() tasks? > > http://lkml.kernel.org/r/201509242050.EHE95837.FVFOOtMQHLJOFS@xxxxxxxxxxxxxxxxxxx > > http://lkml.kernel.org/r/201510121543.EJF21858.LtJFHOOOSQVMFF@xxxxxxxxxxxxxxxxxxx > > Well the wake_oom_reaper is responsible to hand over mm of the OOM > victim and as such it should be a killed process. I guess you mean that > the mm might be shared with another process which is hidden from the OOM > killer, right? Well I think this is not something to care about at this > layer. We shouldn't select a tasks which can lead to this situation in > the first place. Such an oom victim is basically selected incorrectly. I > think we can handle that by a flag in mm_struct. > > I guess we have never cared too much about this case because it sounds > like a misconfiguration. If you want to shoot your own head the do it. > It is true that this patch will make such a case more prominent because > we can cause a side effect now. I can cook up a patch to help to sort > this out. > > Thanks for pointing this out. OK, so I was thinking about that some more and came to the conclusion that we cannot use per mm struct flag. This would be basically equivalent to moving oom_score_adj to mm_struct which has shown to be a problem in the past (especially for vfork(); set_oom_score; execve() loads). So I've ended up with the following. It would mean we will not use the reaper in some cases but those should be marginal and some of them even dubious at best. --- diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 043c0fe146a5..bceeebe96a1b 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -646,6 +646,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, unsigned int victim_points = 0; static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); + bool can_oom_reap = true; /* * If the task is already exiting, don't alarm the sysadmin or kill @@ -736,15 +737,22 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p, continue; if (same_thread_group(p, victim)) continue; - if (unlikely(p->flags & PF_KTHREAD)) - continue; - if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) + if (unlikely(p->flags & PF_KTHREAD) || + p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) { + /* + * We cannot usee oom_reaper for the mm shared by this process + * because it wouldn't get killed and so the memory might be + * still used. + */ + can_oom_reap = false; continue; + } do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true); } rcu_read_unlock(); - wake_oom_reaper(mm); + if (can_oom_reap) + wake_oom_reaper(mm); mmdrop(mm); put_task_struct(victim); } -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>