On Thu, Nov 08, 2018 at 05:38:51PM +0100, Sebastian Andrzej Siewior wrote: > On 2018-11-01 16:12:28 [-0700], Paul E. McKenney wrote: > > > The current check via srcu_online is slightly racy because after looking > > > at srcu_online there could be an interrupt that interrupted us long > > > enough until the CPU we checked against went offline. > > > > I don't see how this can happen, even in -rt. The call to > > srcu_offline_cpu() happens very early in the CPU removal process, > > which means that the synchronize_rcu_mult(call_rcu, call_rcu_sched) > > in sched_cpu_deactivate() would wait for the interrupt to complete. > > And for the enclosing preempt_disable region to complete. > > Is this again a hidden RCU detail that preempt_disable() on CPU4 is > enough to ensure that CPU2 does not get marked offline between? The call_rcu_sched parameter to synchronize_rcu_mult() makes this work. This synchronize_rcu_mult() call is in sched_cpu_deactivate(), so it is a hidden sched/RCU detail, I guess. Or am I missing the point of your question? > > Or is getting rid of that preempt_disable region the real reason for > > this change? > > Well, that preempt_disable() + queue_(delayed_)work() does not work -RT. > But looking further, that preempt_disable() while looking at online CPUs > didn't look good. That is why it is invoked from the very early CPU-hotplug notifier. That early in the process, the preempt_disable() does prevent the current CPU from being taken offline twice: Once due to synchronize_rcu_mult(), and once due to the stop-machine call. > > > An alternative would be to hold the hotplug rwsem (so the CPUs don't > > > change their state) and then check based on cpu_online() if we queue it > > > on a specific CPU or not. queue_work_on() itself can handle if something > > > is enqueued on an offline CPU but a timer which is enqueued on an offline > > > CPU won't fire until the CPU is back online. > > > > > > I am not sure if the removal in rcu_init() is okay or not. I assume that > > > SRCU won't enqueue a work item before SRCU is up and ready. > > > > That was the case before the current merge window, but use of call_srcu() > > by tracing means that SRCU needs to be able to deal with call_srcu() > > long before any initialization has happened. The actual callbacks > > won't be invoked until much later, after the scheduler and workqueues > > are completely up and running, but call_srcu() can be invoked very early. > > > > But I am not seeing any removal in rcu_init() in this patch, so I might > > be missing something. > > The description is not up-to-date. There was this hunk: > |@@ -4236,8 +4232,6 @@ void __init rcu_init(void) > | for_each_online_cpu(cpu) { > | rcutree_prepare_cpu(cpu); > | rcu_cpu_starting(cpu); > |- if (IS_ENABLED(CONFIG_TREE_SRCU)) > |- srcu_online_cpu(cpu); > | } > | } > > which got removed in v4.16. Ah! Here is the current rcu_init() code: for_each_online_cpu(cpu) { rcutree_prepare_cpu(cpu); rcu_cpu_starting(cpu); rcutree_online_cpu(cpu); } And rcutree_online_cpu() calls srcu_online_cpu() when CONFIG_TREE_SRCU is enabled, so no need for the direct call from rcu_init(). Thanx, Paul