Re: [RFC-PATCH 1/2] mm: Add __GFP_NO_LOCKS flag

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Thu, 13 Aug 2020 11:52:57 -0700

On Thu, Aug 13, 2020 at 08:26:18PM +0200, peterz@xxxxxxxxxxxxx wrote:
> On Thu, Aug 13, 2020 at 04:34:57PM +0200, Thomas Gleixner wrote:
> > Michal Hocko <mhocko@xxxxxxxx> writes:
> > > On Thu 13-08-20 15:22:00, Thomas Gleixner wrote:
> > >> It basically requires to convert the wait queue to something else. Is
> > >> the waitqueue strict single waiter?
> > >
> > > I would have to double check. From what I remember only kswapd should
> > > ever sleep on it.
> > 
> > That would make it trivial as we could simply switch it over to rcu_wait.
> > 
> > >> So that should be:
> > >> 
> > >> 	if (!preemptible() && gfp == GFP_RT_NOWAIT)
> > >> 
> > >> which is limiting the damage to those callers which hand in
> > >> GFP_RT_NOWAIT.
> > >> 
> > >> lockdep will yell at invocations with gfp != GFP_RT_NOWAIT when it hits
> > >> zone->lock in the wrong context. And we want to know about that so we
> > >> can look at the caller and figure out how to solve it.
> > >
> > > Yes, that would have to somehow need to annotate the zone_lock to be ok
> > > in those paths so that lockdep doesn't complain.
> > 
> > That opens the worst of all cans of worms. If we start this here then
> > Joe programmer and his dog will use these lockdep annotation to evade
> > warnings and when exposed to RT it will fall apart in pieces. Just that
> > at that point Joe programmer moved on to something else and the usual
> > suspects can mop up the pieces. We've seen that all over the place and
> > some people even disable lockdep temporarily because annotations don't
> > help.
> > 
> > PeterZ might have opinions about that too I suspect.
> 
> PeterZ is mightily confused by all of this -- also heat induced brain
> melt.
> 
> I thought the rule was:
> 
>  - No allocators (alloc/free) inside raw_spinlock_t, full-stop.
> 
> Why are we trying to craft an exception?

So that we can reduce post-grace-period cache misses by a factor of
eight when invoking RCU callbacks.  This reduction in cache misses also
makes it more difficult to overrun RCU with floods of either call_rcu()
or kfree_rcu() invocations.

The idea is to allocate page-sized arrays of pointers so that the callback
invocation can sequence through the array instead of walking a linked
list, hence the reduction in cache misses.

If the allocation fails, for example, during OOM events, we fall back to
the linked-list approach.  So, as with much of the rest of the kernel,
under OOM we just run a bit slower.

							Thanx, Paul