On Fri, Nov 18, 2022 at 05:10:50PM -0800, Paul E. McKenney wrote: > On Thu, Nov 17, 2022 at 06:16:25PM +0800, Pingfan Liu wrote: > > On Wed, Nov 16, 2022 at 09:59:22AM -0800, Paul E. McKenney wrote: > > > On Wed, Nov 16, 2022 at 09:56:26AM +0800, Pingfan Liu wrote: > > > > The pair of segcblist operations: rcu_segcblist_advance() and > > > > rcu_segcblist_accelerate() in srcu_gp_start() is needless from two > > > > perspectives: > > > > > > > > -1. As a part of the SRCU state machine, it should take care of either > > > > all sda or none. But here it only takes care of a single sda. > > > > > > I am not so sure that I agree. > > > > > > Taking care of all srcu_node structures' callbacks would be extremely > > > expensive, with the expense increasing with the number of CPUs. However, > > > the cost of taking care of the current CPU's callbacks is quite cheap, > > > at least in the case where the srcu_struct structure's ->srcu_size_state > > > field is equal to SRCU_SIZE_BIG. > > > > > > But are you seeing performance issues with that code on real hardware > > > running real workloads when the srcu_struct structure's ->srcu_size_state > > > field does not equal SRCU_SIZE_BIG? My guess is "no" given that this > > > code has already paid the cost of acquiring the srcu_struct structure's > > > ->lock, but I figured I should ask. The opinion of real hardware running > > > real workloads beats any of our opinions, after all. ;-) > > > > > > If you are seeing real slowdowns, then one option is to decrease the > > > default value of big_cpu_lim from 128 to some appropriate smaller value. > > > Maybe 16 or 32? > > > > > > Alternatively, the code in srcu_gp_start() that this patch removes might > > > be executed only when ssp->srcu_size_state is equal to SRCU_SIZE_BIG. > > > But this approach would likely still have performance issues due to the > > > acquisition of srcu_struct structure's ->lock, right? > > > > > > Also, if you are seeing real slowdowns, please place that information > > > in the commit log. > > > > Thanks for stiring up so much internal information. It is useful to > > guild me through the code reading. > > > > I have not seen any slowdowns, just try to organize the code. Since the > > note before srcu_might_be_idle() explains why it only evaluate a local > > sda, while srcu_gp_start() has not, so I am puzzled about it. > > Locality of reference is a good thing in general, so unless it is a > strange case, there won't always be a "do this to promote locality of > reference" comment. > Learned. > > > Thoughts? > > > > > > > -2. From the viewpoint of the callback, at the entrance, > > > > srcu_gp_start_if_needed() has called that pair operations and attached > > > > it with gp_seq. At the exit, srcu_invoke_callbacks() calls that pair > > > > again to extract the done callbacks. So the harvesting of the callback > > > > is not affected by the call to that pair in srcu_gp_start(). > > > > > > > > But because the updating of RCU_DONE_TAIL by srcu_invoke_callbacks() may > > > > have some delay than by srcu_gp_end()->srcu_gp_start(), the removal may > > > > cause srcu_might_be_idle() not to be real time. To compensate that, > > > > supplement that pair just before the calling to rcu_segcblist_pend_cbs() > > > > in srcu_might_be_idle(). > > > > > > I agree that srcu_might_be_idle() is not a bad place to add such a check, > > > especially given that it has other reasons to acquire the srcu_data > > > structure's ->lock. Except that this misses both call_srcu() and > > > synchronize_srcu_expedited(), so that if a workload invokes call_srcu() > > > and/or synchronize_srcu_expedited() on a given srcu_struct structure, > > > but never synchronize)srcu(), the opportunity to advance that CPU's > > > callbacks at the beginning of a grace period will always be lost. > > > > srcu_gp_start_if_needed() has rcu_segcblist_{advance/accelerate}() > > unconditionally, so the beginning of a grace period will be always > > detected via any path through __call_srcu(). Do I miss anything? > > Grace periods can also be started by srcu_gp_end(). > But it is helpless for assiciating a gp with the enqueued callback. > > And I think the core issue is if it matters that the updating of > > segcblist defers from srcu_gp_end()->srcu_gp_start() to > > srcu_invoke_callbacks(). > > The main reason for advancing/accelerating from __call_srcu() is to > make sure that the newly enqueued callback will be associated with a > grace period. Otherwise, the newly enqueued callback might well just > sit there for a very long time. > > Unlike normal RCU, there is no callback polling mechanism in SRCU. > Nor can there reasonably be, due to the potentially very large number > of srcu_struct structures. > > (Yes, if __call_srcu() was to become a performance bottleneck, there > are adjustments that could be made, but let's not increase complexity > before we see a clear need to do so.) > > > > Alternatively, if we have checks in both synchronize_srcu() and > > > srcu_gp_start_if_needed(), we end up duplicating the work in the case > > > where synchronize_srcu() is invoked. This is also not particularly good. > > > > Here srcu_gp_start() has the following caller: > > srcu_advance_state() > > srcu_funnel_gp_start() > > srcu_reschedule() > > srcu_gp_end() > > > > Except srcu_gp_end() increase the gp_seq's counter, the rest has not changed the > > gp_seq's counter. so the calling to rcu_segcblist_{advance/accelerate}() is > > needless there. > > Suppose that this srcu_struct is in state SRCU_SIZE_BIG, so that there > is a callback list for each CPU. > > > Anyway I tried to understand the code and some internal. If you dislike > > it, please skip it. > > Let's skip this for the moment. > Sure. And I have learned a lot from this thread. > > Thanks again for your precious time. > > Please let me know if additional information on locality of reference > would be helpful. > No. That is enough. Appreciate for your detailed explanation. It is very helpful for me to understand the code more deeply. Thanks, Pingfan > Thanx, Paul > > > Pingfan > > > > > Again, thoughts? > > > > > > Thanx, Paul > > > > > > > Test info: > > > > torture test passed using the following command against commit 094226ad94f4 ("Linux 6.1-rc5") > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10h --configs 18*SRCU-P > > > > > > > > Signed-off-by: Pingfan Liu <kernelfans@xxxxxxxxx> > > > > Cc: Lai Jiangshan <jiangshanlai@xxxxxxxxx> > > > > Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx> > > > > Cc: Frederic Weisbecker <frederic@xxxxxxxxxx> > > > > Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx> > > > > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx> > > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > > > > To: rcu@xxxxxxxxxxxxxxx > > > > --- > > > > kernel/rcu/srcutree.c | 15 ++++----------- > > > > 1 file changed, 4 insertions(+), 11 deletions(-) > > > > > > > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > > > > index 725c82bb0a6a..36ba18967133 100644 > > > > --- a/kernel/rcu/srcutree.c > > > > +++ b/kernel/rcu/srcutree.c > > > > @@ -659,21 +659,10 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock); > > > > */ > > > > static void srcu_gp_start(struct srcu_struct *ssp) > > > > { > > > > - struct srcu_data *sdp; > > > > int state; > > > > > > > > - if (smp_load_acquire(&ssp->srcu_size_state) < SRCU_SIZE_WAIT_BARRIER) > > > > - sdp = per_cpu_ptr(ssp->sda, 0); > > > > - else > > > > - sdp = this_cpu_ptr(ssp->sda); > > > > lockdep_assert_held(&ACCESS_PRIVATE(ssp, lock)); > > > > WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed)); > > > > - spin_lock_rcu_node(sdp); /* Interrupts already disabled. */ > > > > - rcu_segcblist_advance(&sdp->srcu_cblist, > > > > - rcu_seq_current(&ssp->srcu_gp_seq)); > > > > - (void)rcu_segcblist_accelerate(&sdp->srcu_cblist, > > > > - rcu_seq_snap(&ssp->srcu_gp_seq)); > > > > - spin_unlock_rcu_node(sdp); /* Interrupts remain disabled. */ > > > > WRITE_ONCE(ssp->srcu_gp_start, jiffies); > > > > WRITE_ONCE(ssp->srcu_n_exp_nodelay, 0); > > > > smp_mb(); /* Order prior store to ->srcu_gp_seq_needed vs. GP start. */ > > > > @@ -1037,6 +1026,10 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp) > > > > /* If the local srcu_data structure has callbacks, not idle. */ > > > > sdp = raw_cpu_ptr(ssp->sda); > > > > spin_lock_irqsave_rcu_node(sdp, flags); > > > > + rcu_segcblist_advance(&sdp->srcu_cblist, > > > > + rcu_seq_current(&ssp->srcu_gp_seq)); > > > > + (void)rcu_segcblist_accelerate(&sdp->srcu_cblist, > > > > + rcu_seq_snap(&ssp->srcu_gp_seq)); > > > > if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) { > > > > spin_unlock_irqrestore_rcu_node(sdp, flags); > > > > return false; /* Callbacks already present, so not idle. */ > > > > -- > > > > 2.31.1 > > > >