On Mon, Jul 22, 2019 at 10:41:52AM -0300, Jason Gunthorpe wrote: > On Mon, Jul 22, 2019 at 04:51:49AM -0700, Paul E. McKenney wrote: > > > > > > Would it make sense to have call_rcu() check to see if there are many > > > > > outstanding requests on this CPU and if so process them before returning? > > > > > That would ensure that frequent callers usually ended up doing their > > > > > own processing. > > > > > > > > Unfortunately, no. Here is a code fragment illustrating why: > > That is only true in the general case though, kfree_rcu() doesn't have > this problem since we know what the callback is doing. In general a > caller of kfree_rcu() should not need to hold any locks while calling > it. Good point, at least as long as the slab allocators don't call kfree_rcu() while holding any of the slab locks. However, that would require a separate list for the kfree_rcu() callbacks, and concurrent access to those lists of kfree_rcu() callbacks. So this might work, but would add some complexity and also yet another restriction between RCU and another kernel subsystem. So I would like to try the other approaches first, for example, the time-based approach in my prototype and Eric Dumazet's more polished patch. But the immediate-invocation possibility is still there if needed. > We could apply the same idea more generally and have some > 'call_immediate_or_rcu()' which has restrictions on the caller's > context. > > I think if we have some kind of problem here it would be better to > handle it inside the core code and only require that callers use the > correct RCU API. Agreed. Especially given that there are a number of things that can be done within RCU. > I can think of many places where kfree_rcu() is being used under user > control.. And same for call_rcu(). And this is not the first time we have run into this. The last time was about 15 years ago, if I remember correctly, and that one led to some of the quiescent-state forcing and callback-invocation batch size tricks still in use today. My only real surprise is that it took so long for this to come up again. ;-) Please note also that in the common case on default configurations, callback invocation is done on the CPU that posted the callback. This means that callback invocation normally applies backpressure to the callback-happy workload. So why then is there a problem? The problem is not the lack of backpressure, but rather that the scheduling of callback invocation needs to be a bit more considerate of the needs of the rest of the system. In the common case, that is. Except that the uncommon case is real-time configurations, in which care is needed anyway. But I am in the midst of helping those out as well, details on the "dev" branch of -rcu. Thanx, Paul