On Fri 11-01-19 23:31:18, Tetsuo Handa wrote: > On 2019/01/11 22:34, Michal Hocko wrote: > > On Fri 11-01-19 21:40:52, Tetsuo Handa wrote: > > [...] > >> Did you notice that there is no > >> > >> "Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n" > >> > >> line between > >> > >> [ 71.304703][ T9694] Memory cgroup out of memory: Kill process 9692 (a.out) score 904 or sacrifice child > >> > >> and > >> > >> [ 71.309149][ T54] oom_reaper: reaped process 9750 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:185532kB > >> > >> ? Then, you will find that [ T9694] failed to reach for_each_process(p) loop inside > >> __oom_kill_process() in the first round of out_of_memory() call because > >> find_lock_task_mm() == NULL at __oom_kill_process() because Ctrl-C made that victim > >> complete exit_mm() before find_lock_task_mm() is called. > > > > OK, so we haven't killed anything because the victim has exited by the > > time we wanted to do so. We still have other tasks sharing that mm > > pending and not killed because nothing has killed them yet, right? > > The OOM killer invoked by [ T9694] called printk() but didn't kill anything. > Instead, SIGINT from Ctrl-C killed all thread groups sharing current->mm. I still do not get it. Those other processes are not sharing signals. Or is it due to injecting the signal too all of them with the proper timing? > > How come the oom reaper could act on this oom event at all then? > > > > What am I missing? > > > > The OOM killer invoked by [ T9750] did not call printk() but hit > task_will_free_mem(current) in out_of_memory() and invoked the OOM reaper, > without calling mark_oom_victim() on all thread groups sharing current->mm. > Did you notice that I wrote that OK, now it starts making sense to me finally. I got hooked up in find_lock_task_mm failing in __oom_kill_process because we do see "Memory cgroup out of memory" and that happens _after_ task_will_free_mem. So the whole oom_reaper scenario didn't make much sense to me. > Since mm-oom-marks-all-killed-tasks-as-oom-victims.patch does not call mark_oom_victim() > when task_will_free_mem() == true, > > ? :-( No, I got lost in your writeup. While the task_will_free_mem is fixable but this would get us to even uglier code so I agree that the approach by my two patches is not feasible. I really wanted to have this heuristic based on the oom victim rather than signal pending because one lesson I've learned over time was that checks for fatal signals can lead to odd corner cases. Memcg is less prone to those issues because we can bypass the charge but still. Anyway, could you update your patch and abstract if (unlikely(tsk_is_oom_victim(current) || fatal_signal_pending(current) || current->flags & PF_EXITING)) in try_charge and reuse it in mem_cgroup_out_of_memory under the oom_lock with an explanation please? Andrew, please drop my 2 patches please. -- Michal Hocko SUSE Labs