Re: [patch 08/12] mm: page_alloc: wait for OOM killer progress before retrying

Johannes Weiner <hannes@xxxxxxxxxxx> · Thu, 26 Mar 2015 14:17:00 -0400

On Thu, Mar 26, 2015 at 04:38:47PM +0100, Michal Hocko wrote:
> On Thu 26-03-15 11:23:43, Johannes Weiner wrote:
> > On Thu, Mar 26, 2015 at 03:32:23PM +0100, Michal Hocko wrote:
> > > On Thu 26-03-15 07:24:45, Johannes Weiner wrote:
> > > > On Wed, Mar 25, 2015 at 11:15:48PM +0900, Tetsuo Handa wrote:
> > > > > Johannes Weiner wrote:
> > > [...]
> > > > > >  	/*
> > > > > > -	 * Acquire the oom lock.  If that fails, somebody else is
> > > > > > -	 * making progress for us.
> > > > > > +	 * This allocating task can become the OOM victim itself at
> > > > > > +	 * any point before acquiring the lock.  In that case, exit
> > > > > > +	 * quickly and don't block on the lock held by another task
> > > > > > +	 * waiting for us to exit.
> > > > > >  	 */
> > > > > > -	if (!mutex_trylock(&oom_lock)) {
> > > > > > -		*did_some_progress = 1;
> > > > > > -		schedule_timeout_uninterruptible(1);
> > > > > > -		return NULL;
> > > > > > +	if (test_thread_flag(TIF_MEMDIE) || mutex_lock_killable(&oom_lock)) {
> > > > > > +		alloc_flags |= ALLOC_NO_WATERMARKS;
> > > > > > +		goto alloc;
> > > > > >  	}
> > > > > 
> > > > > When a thread group has 1000 threads and most of them are doing memory allocation
> > > > > request, all of them will get fatal_signal_pending() == true when one of them are
> > > > > chosen by OOM killer.
> > > > > This code will allow most of them to access memory reserves, won't it?
> > > > 
> > > > Ah, good point!  Only TIF_MEMDIE should get reserve access, not just
> > > > any dying thread.  Thanks, I'll fix it in v2.
> > > 
> > > Do you plan to post this v2 here for review?
> > 
> > Yeah, I was going to wait for feedback to settle before updating the
> > code.  But I was thinking something like this?
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 9ce9c4c083a0..106793a75461 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2344,7 +2344,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, int alloc_flags,
> >  	 * waiting for us to exit.
> >  	 */
> >  	if (test_thread_flag(TIF_MEMDIE) || mutex_lock_killable(&oom_lock)) {
> > -		alloc_flags |= ALLOC_NO_WATERMARKS;
> > +		if (test_thread_flag(TIF_MEMDIE))
> > +			alloc_flags |= ALLOC_NO_WATERMARKS;
> >  		goto alloc;
> >  	}
> 
> OK, I have expected something like this. I understand why you want to
> retry inside this function. But I would prefer if gfp_to_alloc_flags was
> used here so that we do not have that TIF_MEMDIE logic duplicated at two
> places.

I don't think that's a good idea.  gfp_to_alloc_flags() reinitializes
the entire allocation context from the gfp flags and the task state,
but the only thing we care about, which can actually change here, is
TIF_MEMDIE.  This is perfectly obvious and expected in the OOM kill
allocation function, which makes my code self-documenting, whereas if
you use gfp_to_alloc_flags() you have to explain why it is called.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>