Re: [PATCH v2] mm,oom: Do not sleep with oom_lock held.

Michal Hocko <mhocko@xxxxxxx> · Tue, 8 Mar 2016 15:11:26 +0100

On Tue 08-03-16 19:59:15, Tetsuo Handa wrote:
> out_of_memory() can stall effectively forever if a SCHED_IDLE thread
> called out_of_memory() when there are !SCHED_IDLE threads running on
> the same CPU, for schedule_timeout_killable(1) cannot return shortly
> due to scheduling priority on CONFIG_PREEMPT_NONE=y kernels.
> 
> Operations with oom_lock held should complete as soon as possible
> because we might be preserving OOM condition for most of that period
> if we are in OOM condition. SysRq-f can't work if oom_lock is held.
> 
> It would be possible to boost scheduling priority of current thread
> while holding oom_lock, but priority of current thread might be
> manipulated by other threads after boosting. Unless we offload
> operations with oom_lock held to a dedicated kernel thread with high
> priority, addressing this problem using priority manipulation is racy.

As already mentioned whe you posted a similar patch before, this all is
true but the patch doesn't really fixes it so do we really want to have
it in the patch description?

> This patch brings schedule_timeout_killable(1) out of oom_lock.

I am not against this change, though, but can we please have a changelog
which doesn't pretend to be a fix while it is not? Also the changelog
should tell us why oom vs. exit races are not more probable now because
the OOM killing context is basically throttling all others right now
which won't be the case anymore. While I believe this shouldn't matter
it should be considered and described.

> This patch does not address OOM notifiers which are blockable.

Also does it really make sense to post the patch alone without
addressing this part?

> Long term we should focus on making the OOM context not preemptible.
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Michal Hocko <mhocko@xxxxxxx>

my s-o-b is not really necessary, I have merely tried to show how your
previous patch could be simplified.

> ---
>  mm/oom_kill.c   | 14 +++++++-------
>  mm/page_alloc.c |  7 +++++++
>  2 files changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 5d5eca9..c84e784 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -901,15 +901,9 @@ bool out_of_memory(struct oom_control *oc)
>  		dump_header(oc, NULL, NULL);
>  		panic("Out of memory and no killable processes...\n");
>  	}
> -	if (p && p != (void *)-1UL) {
> +	if (p && p != (void *)-1UL)
>  		oom_kill_process(oc, p, points, totalpages, NULL,
>  				 "Out of memory");
> -		/*
> -		 * Give the killed process a good chance to exit before trying
> -		 * to allocate memory again.
> -		 */
> -		schedule_timeout_killable(1);
> -	}
>  	return true;
>  }
>  
> @@ -944,4 +938,10 @@ void pagefault_out_of_memory(void)
>  	}
>  
>  	mutex_unlock(&oom_lock);
> +
> +	/*
> +	 * Give the killed process a good chance to exit before trying
> +	 * to allocate memory again.
> +	 */
> +	schedule_timeout_killable(1);
>  }
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 1993894..378a346 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2888,6 +2888,13 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>  	}
>  out:
>  	mutex_unlock(&oom_lock);
> +	if (*did_some_progress && !page) {
> +		/*
> +		 * Give the killed process a good chance to exit before trying
> +		 * to allocate memory again.
> +		 */
> +		schedule_timeout_killable(1);
> +	}
>  	return page;
>  }
>  
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>