Re: [PATCH v2] mm,oom: speed up select_bad_process() loop.

Michal Hocko <mhocko@xxxxxxxxxx> · Wed, 18 May 2016 14:51:38 +0200

On Wed 18-05-16 21:20:24, Tetsuo Handa wrote:
> Since commit 3a5dda7a17cf3706 ("oom: prevent unnecessary oom kills or
> kernel panics"), select_bad_process() is using for_each_process_thread().
> 
> Since oom_unkillable_task() scans all threads in the caller's thread group
> and oom_task_origin() scans signal_struct of the caller's thread group, we
> don't need to call oom_unkillable_task() and oom_task_origin() on each
> thread. Also, since !mm test will be done later at oom_badness(), we don't
> need to do !mm test on each thread. Therefore, we only need to do
> TIF_MEMDIE test on each thread.
> 
> If we track number of TIF_MEMDIE threads inside signal_struct, we don't
> need to do TIF_MEMDIE test on each thread. This will allow
> select_bad_process() to use for_each_process().

I am wondering whether signal_struct is the best way forward. The oom
killing is more about mm_struct than anything else. We can record that
the mm was oom killed in mm->flags (similar to MMF_OOM_REAPED). I guess
this would require more work at this stage so maybe starting with signal
struct is not that bad afterall. Just thinking...

> This patch adds a counter to signal_struct for tracking how many
> TIF_MEMDIE threads are in a given thread group, and check it at
> oom_scan_process_thread() so that select_bad_process() can use
> for_each_process() rather than for_each_process_thread().

In general I do agree that for_each_process is preferable. I guess you
are missing one case here, though (or maybe just forgot to refresh the
patch because the changelog mentions !mm test):

[...]
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index c0e37dd..1ac24e8 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -283,10 +283,8 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
>  	 * This task already has access to memory reserves and is being killed.
>  	 * Don't allow any other task to have access to the reserves.
>  	 */
> -	if (test_tsk_thread_flag(task, TIF_MEMDIE)) {
> -		if (!is_sysrq_oom(oc))
> -			return OOM_SCAN_ABORT;
> -	}
> +	if (!is_sysrq_oom(oc) && atomic_read(&task->signal->oom_victims))
> +		return OOM_SCAN_ABORT;
>  	if (!task->mm)
>  		return OOM_SCAN_CONTINUE;

So let's say that the group leader is gone, now you would skip the whole
thread group AFAICS. This is an easy way to hide from the OOM killer,
unless I've missed something.

You can safely drop if (!task->mm) check because oom_badness does
find_lock_task_mm so we will catch this case anyway.

Other than that the patch looks good to me.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>