Hello, Paul. On Wed, Apr 26, 2023 at 02:55:04PM -0700, Paul E. McKenney wrote: > But if the call_rcu_tasks_*() code detects too much lock contention on > CPU 0's queue, which indicates that very large numbers of callbacks are > being queued, it switches to per-CPU mode. In which case, we are likely > to have lots of callbacks on lots of queues, and in that case we really > want to invoke them concurrently. > > Then if a later grace period finds that there are no more callbacks, it > switches back to CPU-0 mode. So this extra workqueue overhead should > happen only on systems with sparse cpu_online_masks that are under heavy > call_rcu_tasks_*() load. I still wonder whether it can be solved by simply switching to unbound workqueues instead of implementing custom load-spreading mechanism. We'd be basically asking the scheduler to what it thinks is best instead of trying to make manual CPU placement decisions. That said, as a fix, the original patch looks fine to me. Gonna go ack that. Thanks. -- tejun