Re: [PATCH v3 3/3] oom: decouple mems_allowed from oom_unkillable_task

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Wed, 26 Jun 2019 20:46:02 +0900

On 2019/06/26 19:47, Michal Hocko wrote:
> On Wed 26-06-19 19:19:20, Tetsuo Handa wrote:
>> Is "mempolicy_nodemask_intersects(tsk) returning true when tsk already
>> passed mpol_put_task_policy(tsk) in do_exit()" what we want?
>>
>> If tsk is an already exit()ed thread group leader, that thread group is
>> needlessly selected by the OOM killer because mpol_put_task_policy()
>> returns true?
> 
> I am sorry but I do not really see how this is related to this
> particular patch. Are you suggesting that has_intersects_mems_allowed is
> racy? More racy now?

I'm suspecting the correctness of has_intersects_mems_allowed().
If mask != NULL, mempolicy_nodemask_intersects() is called on each thread in
"start" thread group. And as soon as mempolicy_nodemask_intersects(tsk) returned
true, has_intersects_mems_allowed(start) returns true and "start" is considered
as an OOM victim candidate. And if one of threads in "tsk" thread group has already
passed mpol_put_task_policy(tsk) in do_exit() (e.g. dead thread group leader),
mempolicy_nodemask_intersects(tsk) returns true because tsk->mempolicy == NULL.

I don't know how mempolicy works, but can mempolicy be configured differently for
per a thread basis? If each thread in "start" thread group cannot have different
mempolicy->mode, there is (mostly) no need to use for_each_thread() at
has_intersects_mems_allowed(). Instead, we can use find_lock_task_mm(start)
(provided that MMF_OOM_SKIP is checked like

 	/* p may not have freeable memory in nodemask */
-	if (!is_memcg_oom(oc) && !has_intersects_mems_allowed(task, oc))
+	if (!tsk_is_oom_victim(task) && !is_memcg_oom(oc) && !is_sysrq_oom(oc) &&
+	    !has_intersects_mems_allowed(task, oc))
 		goto next;

) because find_lock_task_mm() == NULL thread groups won't be selected as
an OOM victim candidate...