On Fri, Jun 14, 2024 at 07:11:50AM -0700, Paul E. McKenney wrote: > On Thu, Jun 13, 2024 at 08:22:19PM -0700, Paul E. McKenney wrote: > > On Thu, Jun 13, 2024 at 02:32:27PM -0400, Kent Overstreet wrote: > > [ . . . ] > > After sleeping on it, backing up a level of abstraction, and thus perhaps > responding a bit more constructively... > > And also, thank you very much for looking into this! > > > > commit 3bb5d34d6d6e2d5433cce5d248fd81211afedec0 > > > Author: Kent Overstreet <kent.overstreet@xxxxxxxxx> > > > Date: Mon Jun 10 20:47:03 2024 -0400 > > > > > > bcachefs: pending_rcu_items > > > > > > Generic (?) data structure for explicitly tracking pending SRCU items > > > > > > - Can we make this generic across other RCU flavors? > > It should be possible, at least for current RCU and SRCU. I think I owe > you SRCU variants of a couple of existing RCU functions. Later today! I was wondering how much commonality there is between the core data structures of RCU and SRCU. If RCU internally has something analagous to srcu_struct, and we could refer to that object without caring if it's for RCU or SRCU, that would be cool... otherwise, we can just do it a level up with switch statements. > > > - Do we need to add a fallback to linked lists when darray allocation > > > fails? > > In general, as in kfree_rcu(), yes. In your specific case here, maybe not. > > > > - We wish to avoid the overhead of start_poll_synchronize_srcu() if we > > > already have items for a given sequence number, but it appears > > > get_poll_synchronize_srcu(); start_poll_synchronize_srcu() can > > > actually return differert sequence numbers within the same read-side > > > critical section. Is this the correct way of doing this? > > You can always buffer up items that have not yet been assigned a sequence > number, then after you have enough or after enough time has passed, do a > single get_poll_synchronize_srcu() for the group. I don't think we'd want to do that, since that would be delaying the point at which objects get reclaimed. Have a look at my latest version and tell me if you think that works? It seems to work in my testing. > > > - Do we need to ensure the rearming RCU callback is correctly flushed > > > when exiting? (Probably) > > I believe so, and setting a flag then doing an srcu_barrier() should suffice. Is the only difference between srcu_barrier() and synchronize_srcu() that srcu_barrier() is a noop if callbacks weren't enqueued? > > > +static void pending_rcu_items_process(struct pending_rcu_items *pending) > > > { > > > - struct bkey_cached *ck = container_of(rcu, struct bkey_cached, rcu); > > > + while (pending_rcu_has_items(pending) && > > > + poll_state_synchronize_srcu(pending->srcu, pending->seq)) { > > > + darray_for_each(pending->objs[0], i) > > > + pending->process(pending, *i); > > Given that you have interrupts and/or bh disabled, there needs to be a > third set of callbacks, those whose grace periods are already elapsed. Not following? The pending->process() is exactly that callback. Also, the new version does processing in proccess context, without the lock. > Unioning one of the ->objs[] sets onto this third set, leaving the source > ->objs[] set empty, should strongly be O(1) without the need for memory > allocation. I might be missing something, but at first glance darray > does not support this. > > The pages-of-pointers structure named kvfree_rcu_bulk_data, along with > the associated structures in kernel/rcu/tree.c does this work, but (1) > is open-coded and (2) has to deal with the full range of what the kernel > can throw at it. It falls back to an embedded rcu_head structure, in > response to your second question in the commit log. > > But perhaps this code could be generalized? Or maybe darray can be > appropriately extended? Not sure what you mean by "unioning onto a third set"? > And one thing that I missed the first time around is that you are using > a single ->seq for two sets of objects (->objs[0] and ->objs[1]). > Each set needs its own sequence number. I've got this changed in my latest version, but - I did see one BUG_ON() where we seem to have three sequence numbers in flight. Any idea what might be up with that?