On 2018/10/27 4:25, Michal Hocko wrote: >> out_of_memory() bails on task_will_free_mem(current), which >> specifically *excludes* already reaped tasks. Why are we then adding a >> separate check before that to bail on already reaped victims? > > 696453e66630a has introduced the bail out. > >> Do we want to bail if current is a reaped victim or not? >> >> I don't see how we could skip it safely in general: the current task >> might have been killed and reaped and gotten access to the memory >> reserve and still fail to allocate on its way out. It needs to kill >> the next task if there is one, or warn if there isn't another >> one. Because we're genuinely oom without reclaimable tasks. > > Yes, this would be the case for the global case which is a real OOM > situation. Memcg oom is somehow more relaxed because the oom is local. We can handle possibility of genuinely OOM without reclaimable tasks. Only __GFP_NOFAIL OOM has to select next OOM victim. There is no need to select next OOM victim unless __GFP_NOFAIL. Commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip oom_reaped tasks") was too simple. On 2018/10/27 4:33, Michal Hocko wrote: > On Fri 26-10-18 21:25:51, Michal Hocko wrote: >> On Fri 26-10-18 10:25:31, Johannes Weiner wrote: > [...] >>> There is of course the scenario brought forward in this thread, where >>> multiple threads of a process race and the second one enters oom even >>> though it doesn't need to anymore. What the global case does to catch >>> this is to grab the oom lock and do one last alloc attempt. Should >>> memcg lock the oom_lock and try one more time to charge the memcg? >> >> That would be another option. I agree that making it more towards the >> global case makes it more attractive. My tsk_is_oom_victim is more >> towards "plug this particular case". > > Nevertheless let me emphasise that tsk_is_oom_victim will close the race > completely, while mem_cgroup_margin will always be racy. So the question > is whether we want to close the race because it is just too easy for > userspace to hit it or keep the global and memcg oom handling as close > as possible. > Yes, adding tsk_is_oom_victim(current) before calling out_of_memory() from both global OOM and memcg OOM paths can close the race completely. (But note that tsk_is_oom_victim(current) for global OOM path needs to check for __GFP_NOFAIL in order to handle genuinely OOM case.)