Re: srcu_cleanup warning

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Fri, 14 Jun 2024 16:14:29 -0400

On Fri, Jun 14, 2024 at 07:11:50AM -0700, Paul E. McKenney wrote:
> On Thu, Jun 13, 2024 at 08:22:19PM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 13, 2024 at 02:32:27PM -0400, Kent Overstreet wrote:
> 
> [ . . . ]
> 
> After sleeping on it, backing up a level of abstraction, and thus perhaps
> responding a bit more constructively...
> 
> And also, thank you very much for looking into this!
> 
> > > commit 3bb5d34d6d6e2d5433cce5d248fd81211afedec0
> > > Author: Kent Overstreet <kent.overstreet@xxxxxxxxx>
> > > Date:   Mon Jun 10 20:47:03 2024 -0400
> > > 
> > >     bcachefs: pending_rcu_items
> > >     
> > >     Generic (?) data structure for explicitly tracking pending SRCU items
> > >     
> > >     - Can we make this generic across other RCU flavors?
> 
> It should be possible, at least for current RCU and SRCU.  I think I owe
> you SRCU variants of a couple of existing RCU functions.  Later today!

I was wondering how much commonality there is between the core data
structures of RCU and SRCU. If RCU internally has something analagous to
srcu_struct, and we could refer to that object without caring if it's
for RCU or SRCU, that would be cool... otherwise, we can just do it a
level up with switch statements.

> > >     - Do we need to add a fallback to linked lists when darray allocation
> > >       fails?
> 
> In general, as in kfree_rcu(), yes.  In your specific case here, maybe not.
> 
> > >     - We wish to avoid the overhead of start_poll_synchronize_srcu() if we
> > >       already have items for a given sequence number, but it appears
> > >       get_poll_synchronize_srcu(); start_poll_synchronize_srcu() can
> > >       actually return differert sequence numbers within the same read-side
> > >       critical section. Is this the correct way of doing this?
> 
> You can always buffer up items that have not yet been assigned a sequence
> number, then after you have enough or after enough time has passed, do a
> single get_poll_synchronize_srcu() for the group.

I don't think we'd want to do that, since that would be delaying the
point at which objects get reclaimed. Have a look at my latest version
and tell me if you think that works? It seems to work in my testing.

> > >     - Do we need to ensure the rearming RCU callback is correctly flushed
> > >       when exiting? (Probably)
> 
> I believe so, and setting a flag then doing an srcu_barrier() should suffice.

Is the only difference between srcu_barrier() and synchronize_srcu()
that srcu_barrier() is a noop if callbacks weren't enqueued?

> > > +static void pending_rcu_items_process(struct pending_rcu_items *pending)
> > >  {
> > > -	struct bkey_cached *ck = container_of(rcu, struct bkey_cached, rcu);
> > > +	while (pending_rcu_has_items(pending) &&
> > > +	       poll_state_synchronize_srcu(pending->srcu, pending->seq)) {
> > > +		darray_for_each(pending->objs[0], i)
> > > +			pending->process(pending, *i);
> 
> Given that you have interrupts and/or bh disabled, there needs to be a
> third set of callbacks, those whose grace periods are already elapsed.

Not following? The pending->process() is exactly that callback.

Also, the new version does processing in proccess context, without the
lock.

> Unioning one of the ->objs[] sets onto this third set, leaving the source
> ->objs[] set empty, should strongly be O(1) without the need for memory
> allocation.  I might be missing something, but at first glance darray
> does not support this.
> 
> The pages-of-pointers structure named kvfree_rcu_bulk_data, along with
> the associated structures in kernel/rcu/tree.c does this work, but (1)
> is open-coded and (2) has to deal with the full range of what the kernel
> can throw at it.  It falls back to an embedded rcu_head structure, in
> response to your second question in the commit log.
> 
> But perhaps this code could be generalized?  Or maybe darray can be
> appropriately extended?

Not sure what you mean by "unioning onto a third set"?

> And one thing that I missed the first time around is that you are using
> a single ->seq for two sets of objects (->objs[0] and ->objs[1]).
> Each set needs its own sequence number.

I've got this changed in my latest version, but - I did see one BUG_ON()
where we seem to have three sequence numbers in flight. Any idea what
might be up with that?