On 2020/07/16 2:30, David Rientjes wrote: > But regardless of whether we present previous data to the user in the > kernel log or not, we've determined that oom killing a process is a > serious matter and go to any lengths possible to avoid having to do it. > For us, that means waiting until the "point of no return" to either go > ahead with oom killing a process or aborting and retrying the charge. > > I don't think moving the mem_cgroup_margin() check to out_of_memory() > right before printing the oom info and killing the process is a very > invasive patch. Any strong preference against doing it that way? I think > moving the check as late as possible to save a process from being killed > when racing with an exiter or killed process (including perhaps current) > has a pretty clear motivation. > How about ignoring MMF_OOM_SKIP for once? I think this has almost same effect as moving the mem_cgroup_margin() check to out_of_memory() right before printing the oom info and killing the process. diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 48e0db54d838..88170af3b9eb 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -322,7 +322,8 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) * any memory is quite low. */ if (!is_sysrq_oom(oc) && tsk_is_oom_victim(task)) { - if (test_bit(MMF_OOM_SKIP, &task->signal->oom_mm->flags)) + if (test_bit(MMF_OOM_SKIP, &task->signal->oom_mm->flags) && + !test_and_clear_bit(MMF_OOM_REAP_QUEUED, &task->signal->oom_mm->flags)) goto next; goto abort; } @@ -658,7 +659,8 @@ static int oom_reaper(void *unused) static void wake_oom_reaper(struct task_struct *tsk) { /* mm is already queued? */ - if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags)) + if (test_bit(MMF_OOM_SKIP, &tsk->signal->oom_mm->flags) || + test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags)) return; get_task_struct(tsk);