Re: One-off rcu_nocb_rdp_deoffload bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 06, 2024 at 12:02:00PM +0530, Neeraj upadhyay wrote:
> On Fri, Sep 6, 2024 at 11:34 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >
> > On Thu, Sep 05, 2024 at 08:41:02PM +0200, Frederic Weisbecker wrote:
> > > Le Thu, Sep 05, 2024 at 08:32:16PM +0200, Frederic Weisbecker a écrit :
> > > > Le Wed, Sep 04, 2024 at 06:52:36AM -0700, Paul E. McKenney a écrit :
> > > > > > Yes, I'm preparing an update for the offending patch (which has one more
> > > > > > embarassing issue while I'm going through it again).
> > > > >
> > > > > Very good, thank you!
> > > >
> > > > So my proposal for a replacement patch is this (to replace the patch
> > > > of the same name in Neeraj tree):
> > >
> > > FYI, the diffstat against the previous version of the same patch is as follows.
> > > The rationale being:
> > >
> > > 1) rdp->nocb_cb_kthread doesn't need to be protected by nocb_gp_kthread_mutex
> > >
> > > 2) Once rcuoc is parked, we really _must_ observe the callback list counter decremented
> > >    after the barrier's completion.
> > >
> > > 3) This fixes another issue: rcuoc must be parked _before_
> > >    rcu_nocb_queue_toggle_rdp() is called, otherwise a nocb locked sequence
> > >    within rcuoc would race with rcuog clearing SEGCBLIST_OFFLOADED concurrently,
> > >    leaving the nocb locked forever.
> >
> > Thank you!!!
> >
> > Just to make sure that I understand, I apply this patch on top of
> > Neeraj's current set of branches to get the fix, correct?
> >
> 
> I have pushed this diff to branch next.06.09.24a of shared-rcu tree
> and started testing.
> Will squash this diff to the original commit later.
> 
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux.git/commit/?h=next.06.09.24a&id=0fc7fc28b5afc3037ae4e1464013bc38c4b51c99

Thank you!  I fired off light testing which is unlikely to be conclusive.
But if it passes, I will do some longer and more focused tests.

							Thanx, Paul

> - Neeraj
> 
> >                                                         Thanx, Paul
> >
> > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > > index 755ada098035..97b99cd06923 100644
> > > --- a/kernel/rcu/tree_nocb.h
> > > +++ b/kernel/rcu/tree_nocb.h
> > > @@ -1056,6 +1056,13 @@ static int rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> > >       /* Flush all callbacks from segcblist and bypass */
> > >       rcu_barrier();
> > >
> > > +     /*
> > > +      * Make sure the rcuoc kthread isn't in the middle of a nocb locked
> > > +      * sequence while offloading is deactivated, along with nocb locking.
> > > +      */
> > > +     if (rdp->nocb_cb_kthread)
> > > +             kthread_park(rdp->nocb_cb_kthread);
> > > +
> > >       rcu_nocb_lock_irqsave(rdp, flags);
> > >       WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass));
> > >       WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist));
> > > @@ -1064,13 +1071,11 @@ static int rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> > >       wake_gp = rcu_nocb_queue_toggle_rdp(rdp);
> > >
> > >       mutex_lock(&rdp_gp->nocb_gp_kthread_mutex);
> > > +
> > >       if (rdp_gp->nocb_gp_kthread) {
> > >               if (wake_gp)
> > >                       wake_up_process(rdp_gp->nocb_gp_kthread);
> > >
> > > -             if (rdp->nocb_cb_kthread)
> > > -                     kthread_park(rdp->nocb_cb_kthread);
> > > -
> > >               swait_event_exclusive(rdp->nocb_state_wq,
> > >                                     rcu_nocb_rdp_deoffload_wait_cond(rdp));
> > >       } else {




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux