Re: [PATCH] mm, oom: show process exiting information in __oom_kill_process()

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Mon, 20 Jul 2020 22:11:07 +0900

On 2020/07/20 21:19, Yafang Shao wrote:
> On Mon, Jul 20, 2020 at 7:06 PM Tetsuo Handa
> <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>>
>> On 2020/07/20 19:36, Yafang Shao wrote:
>>> On Mon, Jul 20, 2020 at 3:16 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>>>> I do agree that a silent bail out is not the best thing to do. The above
>>>> message would be more useful if it also explained what the oom killer
>>>> does (or does not):
>>>>
>>>>         "OOM victim %d (%s) is already exiting. Skip killing the task\n"
>>>>
>>>
>>> Sure.
>>
>> This path is rarely hit because find_lock_task_mm() in oom_badness() from
>> select_bad_process() in the next round of OOM killer will skip this task.
>>
>> Since we don't wake up the OOM reaper when hitting this path, unless __mmput()
>> for this task itself immediately reclaims memory and updates the statistics
>> counter, we just get two chunks of dump_header() messages and one OOM victim.
>>
> 
> Could you pls. explain more specifically why we will get two chunks of
> dump_header()?
> My understanding is the free_mm() happens between select_bad_process()
> and __oom_kill_process() as bellow,
> 
> P1
>              Victim
> select_bad_process()
>     oom_badness()
>         p = find_lock_task_mm()  # p isn't NULL
> 
>                 __mmput()
> 
>                     free_mm()
> dump_header()  # dump once
> __oom_kill_process()
>     p = find_lock_task_mm(victim); # p is NULL now
> 
> So where is another dump_header() ?
> 

Start of __mmput() does not guarantee that memory is reclaimed immediately.
Moreover, even __mmput() might not have started by the moment second chunk of
dump_header() happens. The "OOM victim %d (%s) is already exiting." case only
indicates that victim's mm became NULL; there is no guarantee that memory is
reclaimed (in order to avoid OOM kill) by the moment next round hits.

P1                                Victim1                              Victim2

out_of_memory() {
  select_bad_process() {
    oom_badness() {
      p = find_lock_task_mm() {
        task_lock(victim);       // finds Victim1 because Victim1->mm != NULL.
      }
      get_task_struct(p);
      task_unlock(p);
    }
  }
  oom_kill_process() {
    task_lock(victim);
    task_unlock(victim);
                                  do_exit() {
    dump_header(oc, victim); // first dump_header() with Victim1 and Victim2
    __oom_kill_process(victim, message) {
                                    exit_mm() {
                                      task_lock(current);
                                      current->mm = NULL;
                                      task_unlock(current);
        p = find_lock_task_mm(victim);
        put_task_struct(victim); // without killing Victim1 because p == NULL.
      }
    }
  }
}
out_of_memory() {
  select_bad_process() {
    oom_badness() {
      p = find_lock_task_mm() {
        task_lock(victim);       // finds Victim2 because Victim2->mm != NULL.
      }
      get_task_struct(p);
      task_unlock(p);
    }
  }
                                      mmput() {
                                        __mmput() {
                                          uprobe_clear_state() {
                                            // Might wait for delayed_uprobe_lock.
                                          }
  oom_kill_process() {
    task_lock(victim);
    task_unlock(victim);
    dump_header(oc, victim); // second dump_header() with Victim2
    __oom_kill_process(victim, message) {
      p = find_lock_task_mm(victim);
      pr_err("%s: Killed process %d (%s) "...); // first kill message.
      put_task_struct(p);
    }
  }
}
                                          exit_mmap(); // Which frees memory.
                                        }
                                      }
                                    }
                                  }

Maybe the better behavior is to restart out_of_memory() without dump_header()
(we can remember whether we already called dump_header() into "struct oom_control"),
with last second watermark check before select_bad_process() and after dump_header().