Re: [PATCH] mm: mempolicy: don't select exited threads as OOM victims

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Mon, 1 Jul 2019 22:56:12 +0900

On 2019/07/01 22:48, Michal Hocko wrote:
> On Mon 01-07-19 22:38:58, Tetsuo Handa wrote:
>> On 2019/07/01 22:17, Michal Hocko wrote:
>>> On Mon 01-07-19 22:04:22, Tetsuo Handa wrote:
>>>> But I realized that this patch was too optimistic. We need to wait for mm-less
>>>> threads until MMF_OOM_SKIP is set if the process was already an OOM victim.
>>>
>>> If the process is an oom victim then _all_ threads are so as well
>>> because that is the address space property. And we already do check that
>>> before reaching oom_badness IIRC. So what is the actual problem you are
>>> trying to solve here?
>>
>> I'm talking about behavioral change after tsk became an OOM victim.
>>
>> If tsk->signal->oom_mm != NULL, we have to wait for MMF_OOM_SKIP even if
>> tsk->mm == NULL. Otherwise, the OOM killer selects next OOM victim as soon as
>> oom_unkillable_task() returned true because has_intersects_mems_allowed() returned
>> false because mempolicy_nodemask_intersects() returned false because all thread's
>> mm became NULL (despite tsk->signal->oom_mm != NULL).
> 
> OK, I finally got your point. It was not clear that you are referring to
> the code _after_ the patch you are proposing. You are indeed right that
> this would have a side effect that an additional victim could be
> selected even though the current process hasn't terminated yet. Sigh,
> another example how the whole thing is subtle so I retract my Ack and
> request a real life example of where this matters before we think about
> a proper fix and make the code even more complex.
> 

Instead of checking for mm != NULL, can we move mpol_put_task_policy() from
do_exit() to __put_task_struct() ? That change will (if it is safe to do)
prevent exited threads from setting mempolicy = NULL (and confusing
mempolicy_nodemask_intersects() due to mempolicy == NULL).