Hello, On Wed, Apr 26, 2023 at 02:17:03PM -0700, Paul E. McKenney wrote: > But the idea here is to spread the load of queueing the work as well as > spreading the load of invoking the callbacks. > > I suppose that I could allocate an array of ints, gather the online CPUs > into that array, and do a power-of-two distribution across that array. > But RCU Tasks allows CPUs to go offline with queued callbacks, so this > array would also need to include those CPUs as well as the ones that > are online. Ah, I see, so it needs to make the distinction between cpus which have never been online and are currently offline but used to be online. > Given that the common-case system has a dense cpus_online_mask, I opted > to keep it simple, which is optimal in the common case. > > Or am I missing a trick here? The worry is that on systems with actual CPU hotplugging, cpu_online_mask can be pretty sparse - e.g. 1/4 filled wouldn't be too out there. In such cases, the current code would end scheduling the work items on the issuing CPU (which is what WORK_CPU_UNBOUND does) 3/4 of the time which probably isn't the desired behavior. So, I can initialize all per-cpu workqueues for all possible cpus on boot so that rcu doesn't have to worry about it but that would still have a similar problem of the callbacks not really being spread as intended. I think it depends on how important it is to spread the callback workload evenly. If that matters quite a bit, it probably would make sense to maintain a cpumask for has-ever-been-online CPUs. Otherwise, do you think it can just use an unbound workqueue and forget about manually distributing the workload? Thanks. -- tejun