On Wed, 21 Jun 2017, Tetsuo Handa wrote: > Umm... So, you are pointing out that select_bad_process() aborts based on > TIF_MEMDIE or MMF_OOM_SKIP is broken because victim threads can be removed > from global task list or cgroup's task list. Then, the OOM killer will have to > wait until all mm_struct of interested OOM domain (system wide or some cgroup) > is reaped by the OOM reaper. Simplest way is to wait until all mm_struct are > reaped by the OOM reaper, for currently we are not tracking which memory cgroup > each mm_struct belongs to, are we? But that can cause needless delay when > multiple OOM events occurred in different OOM domains. Do we want to (and can we) > make it possible to tell whether each mm_struct queued to the OOM reaper's list > belongs to the thread calling out_of_memory() ? > I am saying that taking mmget() in mark_oom_victim() and then only dropping it with mmput_async() after it can grab mm->mmap_sem, which the exit path itself takes, or the oom reaper happens to schedule, causes __mmput() to be called much later and thus we remove the process from the tasklist or call cgroup_exit() earlier than the memory can be unmapped with your patch. As a result, subsequent calls to the oom killer kills everything before the original victim's mm can undergo __mmput() because the oom reaper still holds the reference. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>