Re: [PATCH bpf-next v2 1/6] mm, bpf: Introduce __GFP_TRYLOCK for opportunistic page allocation

Michal Hocko <mhocko@xxxxxxxx> · Tue, 10 Dec 2024 10:05:22 +0100

On Tue 10-12-24 05:31:30, Matthew Wilcox wrote:
> On Mon, Dec 09, 2024 at 06:39:31PM -0800, Alexei Starovoitov wrote:
> > +	if (preemptible() && !rcu_preempt_depth())
> > +		return alloc_pages_node_noprof(nid,
> > +					       GFP_NOWAIT | __GFP_ZERO,
> > +					       order);
> > +	return alloc_pages_node_noprof(nid,
> > +				       __GFP_TRYLOCK | __GFP_NOWARN | __GFP_ZERO,
> > +				       order);
> 
> [...]
> 
> > @@ -4009,7 +4018,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask, unsigned int order)
> >  	 * set both ALLOC_NON_BLOCK and ALLOC_MIN_RESERVE(__GFP_HIGH).
> >  	 */
> >  	alloc_flags |= (__force int)
> > -		(gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM));
> > +		(gfp_mask & (__GFP_HIGH | __GFP_KSWAPD_RECLAIM | __GFP_TRYLOCK));
> 
> It's not quite clear to me that we need __GFP_TRYLOCK to implement this.
> I was originally wondering if this wasn't a memalloc_nolock_save() /
> memalloc_nolock_restore() situation (akin to memalloc_nofs_save/restore),
> but I wonder if we can simply do:
> 
> 	if (!preemptible() || rcu_preempt_depth())
> 		alloc_flags |= ALLOC_TRYLOCK;

preemptible is unusable without CONFIG_PREEMPT_COUNT but I do agree that
__GFP_TRYLOCK is not really a preferred way to go forward. For 3
reasons. 

First I do not really like the name as it tells what it does rather than
how it should be used. This is a general pattern of many gfp flags
unfotrunatelly and historically it has turned out error prone. If a gfp
flag is really needed then something like __GFP_ANY_CONTEXT should be
used.  If the current implementation requires to use try_lock for
zone->lock or other changes is not an implementation detail but the user
should have a clear understanding that allocation is allowed from any
context (NMI, IRQ or otherwise atomic contexts).

Is there any reason why GFP_ATOMIC cannot be extended to support new
contexts? This allocation mode is already documented to be usable from
atomic contexts except from NMI and raw_spinlocks. But is it feasible to
extend the current implementation to use only trylock on zone->lock if
called from in_nmi() to reduce unexpected failures on contention for
existing users?

Third, do we even want such a strong guarantee in the generic page
allocator path and make it even more complex and harder to maintain? We
already have a precence in form of __alloc_pages_bulk which is a special
case allocator mode living outside of the page allocator path. It seems
that it covers most of your requirements except the fallback to the
regular allocation path AFAICS. Is this something you could piggy back
on?
-- 
Michal Hocko
SUSE Labs