On Wed 14-06-17 16:43:03, David Rientjes wrote: > If mm->mm_users is not incremented because it is already zero by the oom > reaper, meaning the final refcount has been dropped, do not set > MMF_OOM_SKIP prematurely. > > __mmput() may not have had a chance to do exit_mmap() yet, so memory from > a previous oom victim is still mapped. true and do we have a _guarantee_ it will do it? E.g. can somebody block exit_aio from completing? Or can somebody hold mmap_sem and thus block ksm_exit resp. khugepaged_exit from completing? The reason why I was conservative and set such a mm as MMF_OOM_SKIP was because I couldn't give a definitive answer to those questions. And we really _want_ to have a guarantee of a forward progress here. Killing an additional proecess is a price to pay and if that doesn't trigger normall it sounds like a reasonable compromise to me. > __mput() naturally requires no > references on mm->mm_users to do exit_mmap(). > > Without this, several processes can be oom killed unnecessarily and the > oom log can show an abundance of memory available if exit_mmap() is in > progress at the time the process is skipped. Have you seen this happening in the real life? > Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx> > --- > mm/oom_kill.c | 13 ++++++------- > 1 file changed, 6 insertions(+), 7 deletions(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -531,6 +531,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) > NULL); > } > tlb_finish_mmu(&tlb, 0, -1); > + set_bit(MMF_OOM_SKIP, &mm->flags); > pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", > task_pid_nr(tsk), tsk->comm, > K(get_mm_counter(mm, MM_ANONPAGES)), > @@ -562,7 +563,11 @@ static void oom_reap_task(struct task_struct *tsk) > if (attempts <= MAX_OOM_REAP_RETRIES) > goto done; > > - > + /* > + * Hide this mm from OOM killer because it cannot be reaped since > + * mm->mmap_sem cannot be acquired. > + */ > + set_bit(MMF_OOM_SKIP, &mm->flags); > pr_info("oom_reaper: unable to reap pid:%d (%s)\n", > task_pid_nr(tsk), tsk->comm); > debug_show_all_locks(); > @@ -570,12 +575,6 @@ static void oom_reap_task(struct task_struct *tsk) > done: > tsk->oom_reaper_list = NULL; > > - /* > - * Hide this mm from OOM killer because it has been either reaped or > - * somebody can't call up_write(mmap_sem). > - */ > - set_bit(MMF_OOM_SKIP, &mm->flags); > - > /* Drop a reference taken by wake_oom_reaper */ > put_task_struct(tsk); > } -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>