Re: rcu pending

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Sun, 18 Aug 2024 20:52:59 -0700

On Sun, Aug 18, 2024 at 09:19:16PM -0400, Kent Overstreet wrote:
> On Sun, Aug 18, 2024 at 10:01:42AM GMT, Paul E. McKenney wrote:
> > But you only need one callback per live outstanding "cookie" returned
> > from get_state_synchronize_rcu*() or start_poll_synchronize_rcu().
> > Or am I missing something here?
> 
> Maybe I am?
> 
> I've been assuming that if rcu callbacks are getting punted off to a
> kthread that we can't rely on them being completed in any particular
> timeframe - i.e. the number of grace periods with outstanding callbacks
> would be unbounded.

Exactly, and that is why there is a check for expired cookies during the
addition of a new data element that needs to wait for a grace period.
The only purpose of that call_rcu() callback is to act as a fallback in
case there is a long period of time with no newly arriving data elements
that need to wait for a grace period.

> You're saying that NUM_ACTIVE_RCU_POLL_FULL_OLDSTATE _does_ include
> grace periods with outstanding callbacks? Just want to be clear on that.

No, it does not, but given the algorithm outlined in my previous
email, it doesn't need to.  The point of the callbacks is *not* to
unconditionally drive things forward, but instead to provide a fallback
to do the necessary processing in the case nothing new arrives for an
extended period of time.

> > If so, create a structure that as an rcu_head structure and a
> > cookies.  Create an array of this structure (possibly on a per-CPU,
> > per-shard, or whatever basis), sized by NUM_ACTIVE_RCU_POLL_OLDSTATE or
> > NUM_ACTIVE_RCU_POLL_FULL_OLDSTATE, depending on which set of APIs you
> > are using.  Have a lock that guards the full array.
> > 
> > You also need some sort of structure tracking whatever data elements
> > for which the grace periods are intended, but I will not speculate
> > on what that might be.
> > 
> > Then when you have a data element that needs to wait for a grace period:
> > 
> > 1.	Get a cookie from get_state_synchronize_rcu() or similar.
> > 
> > 2.	Acquire the lock.
> > 
> > 3.	If this cookie is already in the array, release the lock and
> > 	you are done.  (Give or take associating the cookie with the
> > 	data element, however you choose to do this.)
> > 
> > 4.	If this cookie has already expired (it can happen!), set the new
> > 	data element up to be processed (or maybe process it immediately,
> > 	as the case may be).  Proceed to the next step only for unexpired
> > 	new cookies, otherwise, release the lock.
> 
> There's another race, though - we're associating sequence numbers from
> get_state_synchrize_rcu() with call_rcu() callbacks, and that's racy -
> they can end up with different grace periods...

Indeed they can end up with different grace periods, and that is just
fine.  Remember, the call_rcu() callbacks are a fallback mechanism in case
of a cessation of new data elements arriving in need of grace periods.
If there is a constant stream of such data elements, the call_rcu()
callback will (almost) always find that its array element has not yet
expired (because some intervening data element overwrote the expired
cookie with a new unexpired cookie), and will thus simply invoke
call_rcu() on itself.

> I think solving that would require a call_rcu_for_gp() API that takes
> the sequence number we previously got from get_state_synchronize_rcu().

You don't need this because this race is harmless.

> I think we can avoid your race in #4 entirely by simply waiting until we
> have irqs disabled to call get_state_synchronize_rcu() (at least for the
> normal RCU variant, the SRCU variant will naturally need a
> srcu_read_lock() if we want to handle it that way).
> 
> That might be beneficial because then call_rcu_for_gp() can never race
> with the grace period expiring.

Except that we don't need to avoid that race.

Or am I missing something subtle here?

							Thanx, Paul