On Fri, Sep 02, 2022 at 05:21:32PM +0200, Frederic Weisbecker wrote: [..] > > + > > raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags); > > > > trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, reason); > [...] > > @@ -705,12 +816,21 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) > > my_rdp->nocb_gp_gp = needwait_gp; > > my_rdp->nocb_gp_seq = needwait_gp ? wait_gp_seq : 0; > > > > - if (bypass && !rcu_nocb_poll) { > > - // At least one child with non-empty ->nocb_bypass, so set > > - // timer in order to avoid stranding its callbacks. > > - wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_BYPASS, > > - TPS("WakeBypassIsDeferred")); > > + // At least one child with non-empty ->nocb_bypass, so set > > + // timer in order to avoid stranding its callbacks. > > + if (!rcu_nocb_poll) { > > + // If bypass list only has lazy CBs. Add a deferred > > + // lazy wake up. > > + if (lazy && !bypass) { > > + wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_LAZY, > > + TPS("WakeLazyIsDeferred")); > > What if: > > 1) rdp(1) has lazy callbacks > 2) all other rdp's have no callback at all > 3) nocb_gp_wait() runs through all rdp's, everything is handled, except for > these lazy callbacks > 4) It reaches the above path, ready to arm the RCU_NOCB_WAKE_LAZY timer, > but it hasn't yet called wake_nocb_gp_defer() > 5) Oh but rdp(2) queues a non-lazy callback. interrupts are disabled so it defers > the wake up to nocb_gp_wait() with arming the timer in RCU_NOCB_WAKE. > 6) nocb_gp_wait() finally calls wake_nocb_gp_defer() and override the timeout > to several seconds ahead. > 7) No more callbacks queued, the non-lazy callback will have to wait several > seconds to complete. > > Or did I miss something? Note that the race exists with RCU_NOCB_WAKE_BYPASS > but it's only about one jiffy delay, not seconds. > So I think the below patch should fix that. But I have not tested it at all and it could very well have issues. In particular, there is a likelihood of a wake up while holding a lock which I'm not sure is safe due to scheduler locks. I'll test it next week. Let me know any thoughts though. ---8<----------------------- From: "Joel Fernandes (Google)" <joel@xxxxxxxxxxxxxxxxx> Subject: [PATCH] rcu: Fix race where wake_nocb_gp_defer() lazy wake can overwrite a non-lazy wake Fix by holding nocb_gp_lock when observing the state of all rdps. If any rdp queued a non-lazy CB, we would do a wake up of the main gp thread. This should address the race Frederick reported (which could effect both non-lazy CBs using the bypass list, and lazy CBs, though lazy CBs much more noticeably). Quoting from Frederic's email: 1) rdp(1) has lazy callbacks 2) all other rdp's have no callback at all 3) nocb_gp_wait() runs through all rdp's, everything is handled, except for these lazy callbacks 4) It reaches the above path, ready to arm the RCU_NOCB_WAKE_LAZY timer, but it hasn't yet called wake_nocb_gp_defer() 5) Oh but rdp(2) queues a non-lazy callback. interrupts are disabled so it defers the wake up to nocb_gp_wait() with arming the timer in RCU_NOCB_WAKE. 6) nocb_gp_wait() finally calls wake_nocb_gp_defer() and override the timeout to several seconds ahead. 7) No more callbacks queued, the non-lazy callback will have to wait several seconds to complete. Here, the nocb gp lock is held when #4 happens. So the deferred wakeup on #5 has to wait till #4 finishes. Reported-by: Frederic Weisbecker <fweisbec@xxxxxxxxx> Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx> --- kernel/rcu/tree_nocb.h | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index 8b46442e4473..6690ece8fe20 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -285,14 +285,15 @@ EXPORT_SYMBOL(rcu_lazy_get_jiffies_till_flush); * Arrange to wake the GP kthread for this NOCB group at some future * time when it is safe to do so. */ -static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype, - const char *reason) +static void __wake_nocb_gp_defer(struct rcu_data *rdp, int waketype, + const char *reason, bool locked) { unsigned long flags; struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; unsigned long mod_jif = 0; - raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags); + if (!locked) + raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags); /* * Bypass and lazy wakeup overrides previous deferments. In case of @@ -323,11 +324,23 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype, if (rdp_gp->nocb_defer_wakeup < waketype) WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype); - raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags); + if (!locked) + raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags); trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, reason); } +static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype, + const char *reason) { + __wake_nocb_gp_defer(rdp, waketype, reason, false); +} + + +static void wake_nocb_gp_defer_locked(struct rcu_data *rdp, int waketype, + const char *reason) { + __wake_nocb_gp_defer(rdp, waketype, reason, true); + /* * Flush the ->nocb_bypass queue into ->cblist, enqueuing rhp if non-NULL. * However, if there is a callback to be enqueued and if ->nocb_bypass @@ -754,6 +767,8 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) * is added to the list, so the skipped-over rcu_data structures * won't be ignored for long. */ + + raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags); list_for_each_entry_rcu(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp, 1) { bool needwake_state = false; bool flush_bypass = false; @@ -855,14 +870,15 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) // If bypass list only has lazy CBs. Add a deferred // lazy wake up. if (lazy && !bypass) { - wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_LAZY, + wake_nocb_gp_defer_locked(my_rdp, RCU_NOCB_WAKE_LAZY, TPS("WakeLazyIsDeferred")); // Otherwise add a deferred bypass wake up. } else if (bypass) { - wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_BYPASS, + wake_nocb_gp_defer_locked(my_rdp, RCU_NOCB_WAKE_BYPASS, TPS("WakeBypassIsDeferred")); } } + raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags); if (rcu_nocb_poll) { /* Polling, so trace if first poll in the series. */ -- 2.37.2.789.g6183377224-goog