Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Wed, 17 Jun 2020 17:52:14 -0700

On Wed, Jun 17, 2020 at 04:46:09PM -0700, Paul E. McKenney wrote:
> > +	// Handle two first channels.
> > +	for (i = 0; i < FREE_N_CHANNELS; i++) {
> > +		for (; bkvhead[i]; bkvhead[i] = bnext) {
> > +			bnext = bkvhead[i]->next;
> > +			debug_rcu_bhead_unqueue(bkvhead[i]);
> > +
> > +			rcu_lock_acquire(&rcu_callback_map);
> > +			if (i == 0) { // kmalloc() / kfree().
> > +				trace_rcu_invoke_kfree_bulk_callback(
> > +					rcu_state.name, bkvhead[i]->nr_records,
> > +					bkvhead[i]->records);
> > +
> > +				kfree_bulk(bkvhead[i]->nr_records,
> > +					bkvhead[i]->records);
> > +			} else { // vmalloc() / vfree().
> > +				for (j = 0; j < bkvhead[i]->nr_records; j++) {
> > +					trace_rcu_invoke_kfree_callback(
> > +						rcu_state.name,
> > +						bkvhead[i]->records[j], 0);
> > +
> > +					vfree(bkvhead[i]->records[j]);
> > +				}
> > +			}
> > +			rcu_lock_release(&rcu_callback_map);
> 
> Not an emergency, but did you look into replacing this "if" statement
> with an array of pointers to functions implementing the legs of the
> "if" statement?  If nothing else, this would greatly reduced indentation.

I don't think that replacing direct function calls with indirect function
calls is a great suggestion with the current state of play around branch
prediction.

I'd suggest:

 			rcu_lock_acquire(&rcu_callback_map);
			trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
				bkvhead[i]->nr_records, bkvhead[i]->records);
 			if (i == 0) {
 				kfree_bulk(bkvhead[i]->nr_records,
 					bkvhead[i]->records);
 			} else {
 				for (j = 0; j < bkvhead[i]->nr_records; j++) {
 					vfree(bkvhead[i]->records[j]);
 				}
 			}
 			rcu_lock_release(&rcu_callback_map);

But I'd also suggest a vfree_bulk be added.  There are a few things
which would be better done in bulk as part of the vfree process
(we batch them up already, but i'm sure we could do better).