Re: [PATCH RFC v2 rcu] Fix get_state_synchronize_rcu_full() GP-start detection

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Sun, 26 Jan 2025 18:03:45 -0800

On Sun, Jan 26, 2025 at 08:22:23PM -0500, Joel Fernandes wrote:
> On Sun, Jan 26, 2025 at 8:13 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> >
> > Hi, Paul and Frederic,
> >
> > [...]
> > > > > On Sat, Jan 25, 2025 at 12:03:58AM +0100, Frederic Weisbecker wrote:
> > > > > > Le Fri, Dec 13, 2024 at 11:49:49AM -0800, Paul E. McKenney a écrit :
> > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > > > > > > index 2f9c9272cd486..d2a91f705a4ab 100644
> > > > > > > --- a/kernel/rcu/rcu.h
> > > > > > > +++ b/kernel/rcu/rcu.h
> > > > > > > @@ -162,7 +162,7 @@ static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s)
> > > > > > >  {
> > > > > > >         unsigned long cur_s = READ_ONCE(*sp);
> > > > > > >
> > > > > > > -       return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 1));
> > > > > > > +       return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (3 * RCU_SEQ_STATE_MASK + 1));
> > > > > >
> > [...]
> > > > > > The way I understand it is that rcu_state.gp_seq might be seen started while
> > > > > > root_rnp->gp_seq is not. So rcu_seq_snap() on the started rcu_state.gp_seq
> > > > > > may return maximum 2 full GPs ahead of root_rnp->gp_seq. And therefore it takes below
> > > > > > 2 GPs to safely deduce we wrapped around.
> > > > >
> > > > > Exactly!
> > > > >
> > > > > > Should it be ULONG_CMP_LT(cur_s, s - (2 * (RCU_SEQ_STATE_MASK + 1))) ?
> > > > >
> > > > > Quite possibly.  I freely admit that I allowed a bit of slop because
> > > > > time was of the essence (holidays and all that) and also it does not
> > > > > hurt much to lose a couple of counts out of a 2^32 cycle, to say nothing
> > > > > of the common-case 2^64 cycle.  It would not hurt to be exact, but it
> > > > > would be necessary to convince ourselves that we were not off by one in
> > > > > the wrong direction.
> > > > >
> > > > > I would be happy to see a patch, as long as it was sufficiently
> > > > > convincing.
> > > >
> > > > I'm not so much concerned about being exact but rather about making
> > > > sure we still understand what we did within one year. We can leave one
> > > > more grace period than what we expect out of paranoia but, the most
> > > > important is that we comment about what we expect and why. Let me
> > > > prepare a patch for that.
> > >
> > > Even better!
> >
> > Do we really have users who could pass such a huge delta wrapped
> > around value to poll() i.e > ULONG_MAX/2 ?  For 32-bit, that would be
> > 2 billion count since the get() (500 million GPs on 32-bit?). I am
> > curious if such a scenario should be a WARN() because also: If more
> > than ULONG_MAX/2 values are possible after get(), then a full or
> > multiple ULONG_MAX wraparound is also possible. This means then both
> > rcu_seq_done() and rcu_seq_done_exact() become unreliable anyway for
> > such stale get() values.
> >
> > I do agree with both your points on the side effect of the patch to
> > rcu_seq_done_exact(), but I am not fully convinced myself about
> > utility of rcu_seq_done_exact() versus the rcu_seq_done() but I could
> > be missing something.
> 
> I want to modify my comment on reliability. rcu_seq_done_exact() is
> certainly more "reliable" than rcu_seq_done() for wraparound delta  >
> ULONG_MAX/2. Still with such a huge wrap around it is not fail proof
> if it lands within the "3 Grace period" window, so if it is not super
> reliable and if the user should not rely on it, then I wonder if it is
> better to not do it and instead warn the user they are playing with
> fire. But a counter-argument might be, landing within the 3 GP window
> is quite low probability...

It is also a harmless false negative.  And another poll within a few
hundred milliseconds will set things straight.  In contrast, if we
used rcu_seq_done(), it would be a good long time before we got out of
false-negative territory.

On a 32-bit system, if there was an expedited grace period every 10
microseconds (just barely possible on small systems), then a 32-bit
counter would wrap in about 12 hours.  So there would be a six-hour
false-negative zone.

So an expedited grace period every 10 microseconds combined with
a six-hour polling delay is unlikely, but not out of the realm of
possibility.

Please feel free to nominate a comment.

> > Otherwise I analyzed the patch and it makes sense to me:
> >
> > Reviewed-by:Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>

Thank you!  I will apply this on my next review.

						Thanx, Paul