On Fri, Sep 06, 2024 at 12:02:00PM +0530, Neeraj upadhyay wrote: > On Fri, Sep 6, 2024 at 11:34 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > On Thu, Sep 05, 2024 at 08:41:02PM +0200, Frederic Weisbecker wrote: > > > Le Thu, Sep 05, 2024 at 08:32:16PM +0200, Frederic Weisbecker a écrit : > > > > Le Wed, Sep 04, 2024 at 06:52:36AM -0700, Paul E. McKenney a écrit : > > > > > > Yes, I'm preparing an update for the offending patch (which has one more > > > > > > embarassing issue while I'm going through it again). > > > > > > > > > > Very good, thank you! > > > > > > > > So my proposal for a replacement patch is this (to replace the patch > > > > of the same name in Neeraj tree): > > > > > > FYI, the diffstat against the previous version of the same patch is as follows. > > > The rationale being: > > > > > > 1) rdp->nocb_cb_kthread doesn't need to be protected by nocb_gp_kthread_mutex > > > > > > 2) Once rcuoc is parked, we really _must_ observe the callback list counter decremented > > > after the barrier's completion. > > > > > > 3) This fixes another issue: rcuoc must be parked _before_ > > > rcu_nocb_queue_toggle_rdp() is called, otherwise a nocb locked sequence > > > within rcuoc would race with rcuog clearing SEGCBLIST_OFFLOADED concurrently, > > > leaving the nocb locked forever. > > > > Thank you!!! > > > > Just to make sure that I understand, I apply this patch on top of > > Neeraj's current set of branches to get the fix, correct? > > > > I have pushed this diff to branch next.06.09.24a of shared-rcu tree > and started testing. > Will squash this diff to the original commit later. > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux.git/commit/?h=next.06.09.24a&id=0fc7fc28b5afc3037ae4e1464013bc38c4b51c99 Thank you! I fired off light testing which is unlikely to be conclusive. But if it passes, I will do some longer and more focused tests. Thanx, Paul > - Neeraj > > > Thanx, Paul > > > > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > > > index 755ada098035..97b99cd06923 100644 > > > --- a/kernel/rcu/tree_nocb.h > > > +++ b/kernel/rcu/tree_nocb.h > > > @@ -1056,6 +1056,13 @@ static int rcu_nocb_rdp_deoffload(struct rcu_data *rdp) > > > /* Flush all callbacks from segcblist and bypass */ > > > rcu_barrier(); > > > > > > + /* > > > + * Make sure the rcuoc kthread isn't in the middle of a nocb locked > > > + * sequence while offloading is deactivated, along with nocb locking. > > > + */ > > > + if (rdp->nocb_cb_kthread) > > > + kthread_park(rdp->nocb_cb_kthread); > > > + > > > rcu_nocb_lock_irqsave(rdp, flags); > > > WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass)); > > > WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist)); > > > @@ -1064,13 +1071,11 @@ static int rcu_nocb_rdp_deoffload(struct rcu_data *rdp) > > > wake_gp = rcu_nocb_queue_toggle_rdp(rdp); > > > > > > mutex_lock(&rdp_gp->nocb_gp_kthread_mutex); > > > + > > > if (rdp_gp->nocb_gp_kthread) { > > > if (wake_gp) > > > wake_up_process(rdp_gp->nocb_gp_kthread); > > > > > > - if (rdp->nocb_cb_kthread) > > > - kthread_park(rdp->nocb_cb_kthread); > > > - > > > swait_event_exclusive(rdp->nocb_state_wq, > > > rcu_nocb_rdp_deoffload_wait_cond(rdp)); > > > } else {