Re: [PATCH] mm, oom: show process exiting information in __oom_kill_process()

Yafang Shao <laoar.shao@xxxxxxxxx> · Mon, 20 Jul 2020 21:59:49 +0800

On Mon, Jul 20, 2020 at 9:11 PM Tetsuo Handa
<penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On 2020/07/20 21:19, Yafang Shao wrote:
> > On Mon, Jul 20, 2020 at 7:06 PM Tetsuo Handa
> > <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
> >>
> >> On 2020/07/20 19:36, Yafang Shao wrote:
> >>> On Mon, Jul 20, 2020 at 3:16 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> >>>> I do agree that a silent bail out is not the best thing to do. The above
> >>>> message would be more useful if it also explained what the oom killer
> >>>> does (or does not):
> >>>>
> >>>>         "OOM victim %d (%s) is already exiting. Skip killing the task\n"
> >>>>
> >>>
> >>> Sure.
> >>
> >> This path is rarely hit because find_lock_task_mm() in oom_badness() from
> >> select_bad_process() in the next round of OOM killer will skip this task.
> >>
> >> Since we don't wake up the OOM reaper when hitting this path, unless __mmput()
> >> for this task itself immediately reclaims memory and updates the statistics
> >> counter, we just get two chunks of dump_header() messages and one OOM victim.
> >>
> >
> > Could you pls. explain more specifically why we will get two chunks of
> > dump_header()?
> > My understanding is the free_mm() happens between select_bad_process()
> > and __oom_kill_process() as bellow,
> >
> > P1
> >              Victim
> > select_bad_process()
> >     oom_badness()
> >         p = find_lock_task_mm()  # p isn't NULL
> >
> >                 __mmput()
> >
> >                     free_mm()
> > dump_header()  # dump once
> > __oom_kill_process()
> >     p = find_lock_task_mm(victim); # p is NULL now
> >
> > So where is another dump_header() ?
> >
>
> Start of __mmput() does not guarantee that memory is reclaimed immediately.
> Moreover, even __mmput() might not have started by the moment second chunk of
> dump_header() happens. The "OOM victim %d (%s) is already exiting." case only
> indicates that victim's mm became NULL; there is no guarantee that memory is
> reclaimed (in order to avoid OOM kill) by the moment next round hits.
>
> P1                                Victim1                              Victim2
>
> out_of_memory() {
>   select_bad_process() {
>     oom_badness() {
>       p = find_lock_task_mm() {
>         task_lock(victim);       // finds Victim1 because Victim1->mm != NULL.
>       }
>       get_task_struct(p);
>       task_unlock(p);
>     }
>   }
>   oom_kill_process() {
>     task_lock(victim);
>     task_unlock(victim);
>                                   do_exit() {
>     dump_header(oc, victim); // first dump_header() with Victim1 and Victim2
>     __oom_kill_process(victim, message) {
>                                     exit_mm() {
>                                       task_lock(current);
>                                       current->mm = NULL;
>                                       task_unlock(current);
>         p = find_lock_task_mm(victim);
>         put_task_struct(victim); // without killing Victim1 because p == NULL.
>       }
>     }
>   }
> }
> out_of_memory() {
>   select_bad_process() {
>     oom_badness() {
>       p = find_lock_task_mm() {
>         task_lock(victim);       // finds Victim2 because Victim2->mm != NULL.
>       }
>       get_task_struct(p);
>       task_unlock(p);
>     }
>   }
>                                       mmput() {
>                                         __mmput() {
>                                           uprobe_clear_state() {
>                                             // Might wait for delayed_uprobe_lock.
>                                           }
>   oom_kill_process() {
>     task_lock(victim);
>     task_unlock(victim);
>     dump_header(oc, victim); // second dump_header() with Victim2
>     __oom_kill_process(victim, message) {
>       p = find_lock_task_mm(victim);
>       pr_err("%s: Killed process %d (%s) "...); // first kill message.
>       put_task_struct(p);
>     }
>   }
> }
>                                           exit_mmap(); // Which frees memory.
>                                         }
>                                       }
>                                     }
>                                   }
>
> Maybe the better behavior is to restart out_of_memory() without dump_header()
> (we can remember whether we already called dump_header() into "struct oom_control"),
> with last second watermark check before select_bad_process() and after dump_header().

I understand what you mean now.
But I agree with Michal that this output won't be harmful in your case.
And for your case, I think Michal's suggestion that retry the victim
selection would be better.

-- 
Thanks
Yafang