On Fri 29-05-15 21:40:47, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Thu 28-05-15 06:59:32, Tetsuo Handa wrote: > > > I just imagined a case where p is blocked at down_read() in acct_collect() from > > > do_exit() when p is sharing mm with other processes, and other process is doing > > > blocking operation with mm->mmap_sem held for writing. Is such case impossible? > > > > It is very much possible and I have missed this case when proposing > > my alternative. The other process could be doing an address space > > operation e.g. mmap which requires an allocation. > > Are there locations that do memory allocations with mm->mmap_sem held for > writing? Yes, I've written that in my previous email. > Is it possible that thread1 is doing memory allocation between > down_write(¤t->mm->mmap_sem) and up_write(¤t->mm->mmap_sem), > thread2 sharing the same mm is waiting at down_read(¤t->mm->mmap_sem), > and the OOM killer invoked by thread3 chooses thread2 as the OOM victim and > sets TIF_MEMDIE to thread2? Your usage of thread is confusing. Threads are of no concerns because those get killed when the group leader is killed. If you refer to processes then this is exactly what is handled by: for_each_process(p) if (p->mm == mm && !same_thread_group(p, victim) && !(p->flags & PF_KTHREAD)) { if (p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) continue; task_lock(p); /* Protect ->comm from prctl() */ pr_err("Kill process %d (%s) sharing same memory\n", task_pid_nr(p), p->comm); task_unlock(p); do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true); } [...] > Maybe we can use "struct mm_struct"->"bool chosen_by_oom_killer" and checking > for (current->mm && current->mm->chosen_by_oom_killer) than > test_thread_flag(TIF_MEMDIE) inside the memory allocator? Bool is not sufficient because killing some of the processes might be sufficient to resolve the OOM condition and the rest can survive. This is quite unlikely, all right, but not impossible. And then you would have a dangling chosen_by_oom_killer. So this should be a counter. So I think, but I have to think more about this, a proper way to handle this would be something like the following. The patch is obviously incomplete because memcg OOM killer would need the same treatment which calls for a common helper etc... But this is a real corner case. It would have to be current to trigger OOM killer and the userspace would have to be able to send the signal at the right moment... So I am even not sure this needs fixing. Are you able to trigger it? --- diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 5cfda39b3268..14128575fe86 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -428,6 +428,8 @@ void mark_oom_victim(struct task_struct *tsk) */ __thaw_task(tsk); atomic_inc(&oom_victims); + + atomic_inc(tsk->mm->under_oom); } /** @@ -436,6 +438,7 @@ void mark_oom_victim(struct task_struct *tsk) void exit_oom_victim(void) { clear_thread_flag(TIF_MEMDIE); + atomic_dec(current->active_mm->under_oom) if (!atomic_dec_return(&oom_victims)) wake_up_all(&oom_victims_wait); @@ -681,6 +684,16 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, } /* + * Processes which are sharing mm should die together. If one of them + * was OOM killed already we should shoot others as well. + */ + if (current->mm && atomic_read(current->mm->under_oom)) { + mark_oom_victim(current); + do_send_sig_info(SIGKILL, SEND_SIG_FORCED, current, true); + goto out; + } + + /* * Check if there were limitations on the allocation (only relevant for * NUMA) that may require different handling. */ -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>