On Tue 20-06-17 15:12:55, David Rientjes wrote: [...] > This doesn't prevent serial oom killing for either the system oom killer > or for the memcg oom killer. > > The oom killer cannot detect tsk_is_oom_victim() if the task has either > been removed from the tasklist or has already done cgroup_exit(). For > memcg oom killings in particular, cgroup_exit() is usually called very > shortly after the oom killer has sent the SIGKILL. If the oom reaper does > not fail (for example by failing to grab mm->mmap_sem) before another > memcg charge after cgroup_exit(victim), additional processes are killed > because the iteration does not view the victim. > > This easily kills all processes attached to the memcg with no memory > freeing from any victim. It took me some time to decrypt the above but you are right. Pinning mm_users will prevent exit path to exit_mmap and that can indeed cause another premature oom killing because the task might be unhashed or removed from the memcg before the oom reaper has a chance to reap the task. Thanks for pointing this out. This means that we either have to reimplement the unhashing/cgroup_exit for oom victims or get back to allowing oom reaper to race with exit_mmap. The later sounds much more easier to me. I was offline last two days but I will revisit my original idea ASAP. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>