Michal Hocko wrote: > On Fri 17-06-16 20:38:01, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > > > Anyway, would you be OK with the patch if I added the current->mm check > > > > > and resolve its necessity in a separate patch? > > > > > > > > Please correct task_will_free_mem() in oom_kill_process() as well. > > > > > > We cannot hold task_lock over all task_will_free_mem I am even not sure > > > we have to develop an elaborate way to make it raceless just for the nommu > > > case. The current case is simple as we cannot race here. Is that > > > sufficient for you? > > > > We can use find_lock_task_mm() inside mark_oom_victim(). > > That is, call wake_oom_reaper() from mark_oom_victim() like > > > > void mark_oom_victim(struct task_struct *tsk, bool can_use_oom_reaper) > > { > > WARN_ON(oom_killer_disabled); > > /* OOM killer might race with memcg OOM */ > > tsk = find_lock_task_mm(tsk); > > if (!tsk) > > return; > > if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE)) { > > task_unlock(tsk); > > return; > > } > > task_unlock(tsk); > > atomic_inc(&tsk->signal->oom_victims); > > /* > > * Make sure that the task is woken up from uninterruptible sleep > > * if it is frozen because OOM killer wouldn't be able to free > > * any memory and livelock. freezing_slow_path will tell the freezer > > * that TIF_MEMDIE tasks should be ignored. > > */ > > __thaw_task(tsk); > > atomic_inc(&oom_victims); > > if (can_use_oom_reaper) > > wake_oom_reaper(tsk); > > } > > > > and move mark_oom_victim() by normal path to after task_unlock(victim). > > > > do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true); > > - mark_oom_victim(victim); > > > > - if (can_oom_reap) > > - wake_oom_reaper(victim); > > + wake_oom_reaper(victim, can_oom_reap); > > I do not like this because then we would have to check the reapability > from inside the oom_reaper again. I didn't understand why you think so. But strictly speaking, can_oom_reap calculation in oom_kill_process() is always racy, and [PATCH 10/10] is not safe. CPU0 (memory allocating task) CPU1 (kthread) CPU2 (OOM victim) Calls use_mm(victim->mm). Starts some worker. Enters out_of_memory(). Enters oom_kill_process(). Finishes some worker. Calls rcu_read_lock(). Sets can_oom_reap = false due to process_shares_mm() && !same_thread_group() && (p->flags & PF_KTHREAD). Calls unuse_mm(victim->mm). Continues scanning other processes. Calls mmput(victim->mm). Sends SIGKILL to victim. Calls rcu_read_unlock(). Leaves oom_kill_process(). Calls do_exit(). Leaves out_of_memory(). Sets victim->mm = NULL from exit_mm(). Calls mmput() from exit_mm(). __mmput() is called because victim was the last user. Enters out_of_memory(). oom_scan_process_thread() returns OOM_SCAN_ABORT. Leaves out_of_memory(). __mmput() stalls but the oom_reaper is not called. For correctness, can_oom_reap needs to be calculated inside the oom_reaper. > > But let me ask again. Does this really matter so much just because of > nommu where we can fall in different traps? Can we simply focus on mmu > (aka vast majority of cases) make it work reliably and see what we can > do with nommu later? To me, timeout based one is sufficient for handling any traps that hit nommu kernels after the OOM killer is invoked. Anyway, I don't like this series because this series ignores theoretical cases. I can't make progress as long as you repeat "does it really matter/occur". Please go ahead without Reviewed-by: or Acked-by: from me. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>