On 2018/08/07 5:55, Michal Hocko wrote: > On Tue 07-08-18 05:46:04, Tetsuo Handa wrote: >> On 2018/08/07 5:34, Michal Hocko wrote: >>> On Tue 07-08-18 05:26:23, Tetsuo Handa wrote: >>>> On 2018/08/07 2:56, Michal Hocko wrote: >>>>> So the oom victim indeed passed the above force path after the oom >>>>> invocation. But later on hit the page fault path and that behaved >>>>> differently and for some reason the force path hasn't triggered. I am >>>>> wondering how could we hit the page fault path in the first place. The >>>>> task is already killed! So what the hell is going on here. >>>>> >>>>> I must be missing something obvious here. >>>>> >>>> YOU ARE OBVIOUSLY MISSING MY MAIL! >>>> >>>> I already said this is "mm, oom: task_will_free_mem(current) should ignore MMF_OOM_SKIP for once." >>>> problem which you are refusing at https://www.spinics.net/lists/linux-mm/msg133774.html . >>>> And you again ignored my mail. Very sad... >>> >>> Your suggestion simply didn't make much sense. There is nothing like >>> first check is different from the rest. >>> >> >> I don't think your patch is appropriate. It avoids hitting WARN(1) but does not avoid >> unnecessary killing of OOM victims. >> >> If you look at https://syzkaller.appspot.com/text?tag=CrashLog&x=15a1c770400000 , you will >> notice that both 23766 and 23767 are killed due to task_will_free_mem(current) == false. >> This is "unnecessary killing of additional processes". > > Have you noticed the mere detail that the memcg has to kill any task > attempting the charge because the hard limit is 0? There is simply no > other way around. You cannot charge. There is no unnecessary killing. > Full stop. We do allow temporary breach of the hard limit just to let > the task die and uncharge on the way out. > select_bad_process() is called just because task_will_free_mem("already killed current thread which has not completed __mmput()") == false is a bug. I'm saying that the OOM killer should not give up as soon as MMF_OOM_SKIP is set. static bool oom_has_pending_victims(struct oom_control *oc) { struct task_struct *p, *tmp; bool ret = false; bool gaveup = false; if (is_sysrq_oom(oc)) return false; /* * Wait for pending victims until __mmput() completes or stalled * too long. */ list_for_each_entry_safe(p, tmp, &oom_victim_list, oom_victim_list) { struct mm_struct *mm = p->signal->oom_mm; if (oom_unkillable_task(p, oc->memcg, oc->nodemask)) continue; ret = true; + /* + * Since memcg OOM allows forced charge, we can safely wait + * until __mmput() completes. + */ + if (is_memcg_oom(oc)) + return true; #ifdef CONFIG_MMU /* * Since the OOM reaper exists, we can safely wait until * MMF_OOM_SKIP is set. */ if (!test_bit(MMF_OOM_SKIP, &mm->flags)) { if (!oom_reap_target) { get_task_struct(p); oom_reap_target = p; trace_wake_reaper(p->pid); wake_up(&oom_reaper_wait); } #endif continue; } #endif /* We can wait as long as OOM score is decreasing over time. */ if (!victim_mm_stalling(p, mm)) continue; gaveup = true; list_del(&p->oom_victim_list); /* Drop a reference taken by mark_oom_victim(). */ put_task_struct(p); } if (gaveup) debug_show_all_locks(); return ret; }