Re: [PATCH v2] mm, oom: Fix unnecessary killing of additional processes.

Michal Hocko <mhocko@xxxxxxxxxx> · Mon, 10 Sep 2018 13:40:19 +0200

On Mon 10-09-18 20:27:21, Tetsuo Handa wrote:
> On 2018/09/10 18:54, Michal Hocko wrote:
> > On Sat 08-09-18 13:54:12, Tetsuo Handa wrote:
> > [...]
> > 
> > I will not comment on the above because I have already done so and you
> > keep ignoring it so I will not waste my time again.
> 
> Then, your NACK no longer stands.

And how exactly have you reached that conclusion? Nothing has really
changed. Except you keep pushing this crap no matter what you keep
hearing. You obviously do not worry to put words into my mouth.

> >>   (2) sysctl_oom_kill_allocating_task path can be selected forever
> >>       because it did not check for MMF_OOM_SKIP.
> > 
> > Why is that a problem? sysctl_oom_kill_allocating_task doesn't have any
> > well defined semantic. It is a gross hack to save long and expensive oom
> > victim selection. If we were too strict we should even not allow anybody
> > else but an allocating task to be killed. So selecting it multiple times
> > doesn't sound harmful to me.
> 
> After current thread was selected as an OOM victim by that code path and
> the OOM reaper reaped current thread's memory, the OOM killer has to select
> next OOM victim,

And how have you reached this conclusion. What kind of "kill the
allocating task" semantic really implies this?

> for such situation means that current thread cannot bail
> out due to __GFP_NOFAIL allocation. That is, similar to what you ignored
> 
>   if (tsk_is_oom_victim(current) && !(oc->gfp_mask & __GFP_NOFAIL))
>       return true;
> 
> change. That is, when
> 
>   If current thread is an OOM victim, it is guaranteed to make forward
>   progress (unless __GFP_NOFAIL) by failing that allocation attempt after
>   trying memory reserves. The OOM path does not need to do anything at all.
> 
> failed due to __GFP_NOFAIL, sysctl_oom_kill_allocating_task has to select
> next OOM victim.

this doesn't make any sense

> >>   (3) CONFIG_MMU=n kernels might livelock because nobody except
> >>       is_global_init() case in __oom_kill_process() sets MMF_OOM_SKIP.
> > 
> > And now the obligatory question. Is this a real problem?
> 
> I SAID "POSSIBLE BUGS". You have never heard is not a proof that the problem
> is not occurring in the world. Not everybody is skillful enough to report
> OOM (or low memory) problems to you!

No, we are not making the code overly complex or convoluted for
theoretically possible issues we have never heard before.

> >> which prevent proof of "the forward progress guarantee"
> >> and adds one optimization
> >>
> >>   (4) oom_evaluate_task() was calling oom_unkillable_task() twice because
> >>       oom_evaluate_task() needs to check for !MMF_OOM_SKIP and
> >>       oom_task_origin() tasks before calling oom_badness().
> > 
> > ENOPARSE
> > 
> 
> Not difficult to parse at all.
> 
> oom_evaluate_task() {
> 
>   if (oom_unkillable_task(task, NULL, oc->nodemask))
>     goto next;
> 
>   if (!is_sysrq_oom(oc) && tsk_is_oom_victim(task)) {
>     if (test_bit(MMF_OOM_SKIP, &task->signal->oom_mm->flags))
>       goto next;
>     goto abort;
>   }
> 
>   if (oom_task_origin(task)) {
>     points = ULONG_MAX;
>     goto select;
>   }
> 
>   points = oom_badness(task, NULL, oc->nodemask, oc->totalpages) {
> 
>     if (oom_unkillable_task(p, memcg, nodemask))
>       return 0;
> 
>   }
> }
> 
> By moving oom_task_origin() to inside oom_badness(), and
> by bringing !MMF_OOM_SKIP test earlier, we can eliminate
> 
>   oom_unkillable_task(task, NULL, oc->nodemask)
> 
> test in oom_evaluate_task().

And so what?

-- 
Michal Hocko
SUSE Labs