Re: rcu pending

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Sun, 18 Aug 2024 21:19:16 -0400

On Sun, Aug 18, 2024 at 10:01:42AM GMT, Paul E. McKenney wrote:
> But you only need one callback per live outstanding "cookie" returned
> from get_state_synchronize_rcu*() or start_poll_synchronize_rcu().
> Or am I missing something here?

Maybe I am?

I've been assuming that if rcu callbacks are getting punted off to a
kthread that we can't rely on them being completed in any particular
timeframe - i.e. the number of grace periods with outstanding callbacks
would be unbounded.

You're saying that NUM_ACTIVE_RCU_POLL_FULL_OLDSTATE _does_ include
grace periods with outstanding callbacks? Just want to be clear on that.

> If so, create a structure that as an rcu_head structure and a
> cookies.  Create an array of this structure (possibly on a per-CPU,
> per-shard, or whatever basis), sized by NUM_ACTIVE_RCU_POLL_OLDSTATE or
> NUM_ACTIVE_RCU_POLL_FULL_OLDSTATE, depending on which set of APIs you
> are using.  Have a lock that guards the full array.
> 
> You also need some sort of structure tracking whatever data elements
> for which the grace periods are intended, but I will not speculate
> on what that might be.
> 
> Then when you have a data element that needs to wait for a grace period:
> 
> 1.	Get a cookie from get_state_synchronize_rcu() or similar.
> 
> 2.	Acquire the lock.
> 
> 3.	If this cookie is already in the array, release the lock and
> 	you are done.  (Give or take associating the cookie with the
> 	data element, however you choose to do this.)
> 
> 4.	If this cookie has already expired (it can happen!), set the new
> 	data element up to be processed (or maybe process it immediately,
> 	as the case may be).  Proceed to the next step only for unexpired
> 	new cookies, otherwise, release the lock.

There's another race, though - we're associating sequence numbers from
get_state_synchrize_rcu() with call_rcu() callbacks, and that's racy -
they can end up with different grace periods...

I think solving that would require a call_rcu_for_gp() API that takes
the sequence number we previously got from get_state_synchronize_rcu().

I think we can avoid your race in #4 entirely by simply waiting until we
have irqs disabled to call get_state_synchronize_rcu() (at least for the
normal RCU variant, the SRCU variant will naturally need a
srcu_read_lock() if we want to handle it that way).

That might be beneficial because then call_rcu_for_gp() can never race
with the grace period expiring.