On Wed, Dec 14, 2022 at 10:24:46PM +0800, Pingfan Liu wrote: > Capped by the design and implementation of workqueue, the same work can not > be re-entrant. > > SRCU has only a single work ssp->work, there is no way to occur the > concurrent scene in its state machine. Hence the mutexes for the > concurrency in the state machine is unnecessary. On the other hand, > using two works to utilize the window from the release of srcu_gp_mutex > to the end of srcu_gp_end() to enhance the parallelism has limited > effect. [1] > > To save any trying of improving parallelism by using two work_struct, > put some notes around the srcu_gp_mutex and srcu_cb_mutex. And keep them > instead of removing them, in case in future, the state machine is moved > out of the protection of workqueue non-reentrant protection. Thank you for continuning to push on this! But let's step back and take a wider view, plus summarize a bit. First, as we have discussed, it is necessary to make RCU work not just right now, but also to keep it from becoming too fragile. This notion of "fragile" isn't yet something that can be reasonably measured, but one could unreasonably measure it by either comparing the number of words of English required to explain why it works on the one hand and making lots of random changes and seeing what fraction of those changes break something on the other. These unreasonable measures are, just like the name says, unreasonable. But I hope that they give an intuitive notion of what the goal must be. Again, I don't know of a reasonable scientific (let alone mathematical) way of expressing it. As you noted, the current state is unsatisfying. There are two different mechanisms providing mutual exclusion. This code is nowhere near a fastpath, so the added overhead is absolutely *not* a concern (not yet, that we know of, anyway), but there are still risks, for example: 1. Latent bugs could be introduced that suddenly make their presence known should the code ever be relying solely on the mutexes for mutual exclusion. 2. People imagine other purposes for either the single workqueue or the mutexes, and inject bugs based on the fact that their imagination of the design has diverged from the actual design. These risks are not huge, especially compared to other risks in the code base. For but one example, the code that provides the memory-order guarantees for synchronize_rcu() are intricate and spread widely over the code. As they must be in order to make important fastpaths fast, which is a benefit that is important enough to make taking that risk worthwhile. So what are the options here? 1. Leave it as it is, with the non-huge risks as called out above. 2. Add comments, as in this commit. One risk here is that someone adds the concurrency, but fails to remove the comments, which are now obsolete and confusing. 3. Add words describing the current double-provided mutual exclusion to one of the RCU design documents. 4. Leave mainline and -rcu as they are, but add your multiple-workqueue patch to one of the RCU design documents covering SRCU so that it doesn't get lost should it ever become necessary to speed up SRCU grace periods. 5. Leave mainline as it is, but add your multiple-workqueue patch to -rcu with a not-for-mainline branch marking its presence so that this change can be easily applied later, if needed. (For another example of this, see the sysidle.2017.05.11a branch, which was actually in mainline for some time and then later removed because it was not being used.) 6. Make the workqueue supply concurrency, as in your earlier patch. As noted in the email threads, this adds risk for no known user-visible benefit. 7. Remove the mutexes, and rely on the workqueue for the needed mutual exclusion. This is unlikely to make the code more readable, and makes it more difficult to add concurrency if it is ever needed. 8. Make the workqueue supply concurrency, but only when a given kernel-boot parameter or Kconfig option is supplied. Enable this concurrency only during selected rcutorture tests. This has the advantage of actually testing the placement and usage of the mutexes, but on the other hand it increases test effort for no known end-user-visible benefit. Are there other options? Thanx, Paul > Some test data to show how dual work_struct affects the SRCU > performance: > > -1. on 144 cpus hpe-dl560gen10 > modprobe rcutorture torture_type=srcud fwd_progress=144 fwd_progress_holdoff=1 shutdown_secs=36000 stat_interval=60 verbose=1 > # base line > [37587.632155] srcud: End-test grace-period state: g28287244 f0x0 total-gps=28287244 > # two dual work_struct > [36443.468017] srcud: End-test grace-period state: g29026056 f0x0 total-gps=29026056 > > -2. on 256 cpus amd-milan > modprobe rcutorture torture_type=srcud fwd_progress=256 fwd_progress_holdoff=1 shutdown_secs=36000 stat_interval=60 verbose=1 > # base line > [36093.605732] srcud: End-test grace-period state: g10850284 f0x0 total-gps=10850284 > # two dual work_struct > [36093.856713] srcud: End-test grace-period state: g10672632 f0x0 total-gps=10672632 > > The first test shows that it has about 2.6% improvement, while the > second test shows it has no significant effect. > > [1]: https://lore.kernel.org/rcu/Y5nC2yBhaUYirByo@xxxxxxxxxxxxxxxxxxxxxxxxxx/T/#maf99cf1d92077a1b0b8e5c411c79fa4444576624 > > Signed-off-by: Pingfan Liu <kernelfans@xxxxxxxxx> > Cc: Lai Jiangshan <jiangshanlai@xxxxxxxxx> > Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx> > Cc: Frederic Weisbecker <frederic@xxxxxxxxxx> > Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx> > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx> > Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > Cc: "Zhang, Qiang1" <qiang1.zhang@xxxxxxxxx> > Cc: Boqun Feng <boqun.feng@xxxxxxxxx> > Cc: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> > To: rcu@xxxxxxxxxxxxxxx > --- > include/linux/srcutree.h | 4 ++++ > kernel/rcu/srcutree.c | 5 ++++- > 2 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h > index 558057b517b7..f199c31493ab 100644 > --- a/include/linux/srcutree.h > +++ b/include/linux/srcutree.h > @@ -66,8 +66,12 @@ struct srcu_struct { > /* First node at each level. */ > int srcu_size_state; /* Small-to-big transition state. */ > struct mutex srcu_cb_mutex; /* Serialize CB preparation. */ > + /* Not take effect since the only single sdp->work */ > + /* meets the workqueue non-reentrant condition */ > spinlock_t __private lock; /* Protect counters and size state. */ > struct mutex srcu_gp_mutex; /* Serialize GP work. */ > + /* Not take effect since the only single sdp->work */ > + /* meets the workqueue non-reentrant condition */ > unsigned int srcu_idx; /* Current rdr array element. */ > unsigned long srcu_gp_seq; /* Grace-period seq #. */ > unsigned long srcu_gp_seq_needed; /* Latest gp_seq needed. */ > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > index ce39907fa381..8ebb1a240219 100644 > --- a/kernel/rcu/srcutree.c > +++ b/kernel/rcu/srcutree.c > @@ -815,7 +815,10 @@ static void srcu_gp_end(struct srcu_struct *ssp) > struct srcu_node *snp; > int ss_state; > > - /* Prevent more than one additional grace period. */ > + /* > + * Prevent more than one additional grace period. But at present, it does > + * not take effect. Refer to the note at the definition. > + * / > mutex_lock(&ssp->srcu_cb_mutex); > > /* End the current grace period. */ > -- > 2.31.1 >