Re: [PATCH v4] mm, oom: Fix race when selecting process to kill

Oleg Nesterov <oleg@xxxxxxxxxx> · Tue, 12 Nov 2013 21:01:56 +0100

On 11/11, Sameer Nanda wrote:
>
> The selection of the process to be killed happens in two spots:
> first in select_bad_process and then a further refinement by
> looking for child processes in oom_kill_process. Since this is
> a two step process, it is possible that the process selected by
> select_bad_process may get a SIGKILL just before oom_kill_process
> executes. If this were to happen, __unhash_process deletes this
> process from the thread_group list. This results in oom_kill_process
> getting stuck in an infinite loop when traversing the thread_group
> list of the selected process.
>
> Fix this race by adding a pid_alive check for the selected process
> with tasklist_lock held in oom_kill_process.

OK, looks correct to me. Thanks.

Yes, this is a step backwards, hopefully we will revert this patch soon.
I am starting to think something like while_each_thread_lame_but_safe()
makes sense before we really fix this nasty (and afaics not simple)
problem with with while_each_thread() (which should die).

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>