On Thu, Aug 13, 2020 at 08:26:18PM +0200, peterz@xxxxxxxxxxxxx wrote: > On Thu, Aug 13, 2020 at 04:34:57PM +0200, Thomas Gleixner wrote: > > Michal Hocko <mhocko@xxxxxxxx> writes: > > > On Thu 13-08-20 15:22:00, Thomas Gleixner wrote: > > >> It basically requires to convert the wait queue to something else. Is > > >> the waitqueue strict single waiter? > > > > > > I would have to double check. From what I remember only kswapd should > > > ever sleep on it. > > > > That would make it trivial as we could simply switch it over to rcu_wait. > > > > >> So that should be: > > >> > > >> if (!preemptible() && gfp == GFP_RT_NOWAIT) > > >> > > >> which is limiting the damage to those callers which hand in > > >> GFP_RT_NOWAIT. > > >> > > >> lockdep will yell at invocations with gfp != GFP_RT_NOWAIT when it hits > > >> zone->lock in the wrong context. And we want to know about that so we > > >> can look at the caller and figure out how to solve it. > > > > > > Yes, that would have to somehow need to annotate the zone_lock to be ok > > > in those paths so that lockdep doesn't complain. > > > > That opens the worst of all cans of worms. If we start this here then > > Joe programmer and his dog will use these lockdep annotation to evade > > warnings and when exposed to RT it will fall apart in pieces. Just that > > at that point Joe programmer moved on to something else and the usual > > suspects can mop up the pieces. We've seen that all over the place and > > some people even disable lockdep temporarily because annotations don't > > help. > > > > PeterZ might have opinions about that too I suspect. > > PeterZ is mightily confused by all of this -- also heat induced brain > melt. > > I thought the rule was: > > - No allocators (alloc/free) inside raw_spinlock_t, full-stop. > > Why are we trying to craft an exception? So that we can reduce post-grace-period cache misses by a factor of eight when invoking RCU callbacks. This reduction in cache misses also makes it more difficult to overrun RCU with floods of either call_rcu() or kfree_rcu() invocations. The idea is to allocate page-sized arrays of pointers so that the callback invocation can sequence through the array instead of walking a linked list, hence the reduction in cache misses. If the allocation fails, for example, during OOM events, we fall back to the linked-list approach. So, as with much of the rest of the kernel, under OOM we just run a bit slower. Thanx, Paul