On Sun, Dec 18, 2022 at 09:53:13AM -0800, Paul E. McKenney wrote: > On Wed, Dec 14, 2022 at 10:24:46PM +0800, Pingfan Liu wrote: > > Capped by the design and implementation of workqueue, the same work can not > > be re-entrant. > > > > SRCU has only a single work ssp->work, there is no way to occur the > > concurrent scene in its state machine. Hence the mutexes for the > > concurrency in the state machine is unnecessary. On the other hand, > > using two works to utilize the window from the release of srcu_gp_mutex > > to the end of srcu_gp_end() to enhance the parallelism has limited > > effect. [1] > > > > To save any trying of improving parallelism by using two work_struct, > > put some notes around the srcu_gp_mutex and srcu_cb_mutex. And keep them > > instead of removing them, in case in future, the state machine is moved > > out of the protection of workqueue non-reentrant protection. > > Thank you for continuning to push on this! > > But let's step back and take a wider view, plus summarize a bit. > > First, as we have discussed, it is necessary to make RCU work not just > right now, but also to keep it from becoming too fragile. This notion of > "fragile" isn't yet something that can be reasonably measured, but one > could unreasonably measure it by either comparing the number of words > of English required to explain why it works on the one hand and making > lots of random changes and seeing what fraction of those changes break > something on the other. These unreasonable measures are, just like the > name says, unreasonable. But I hope that they give an intuitive notion > of what the goal must be. Again, I don't know of a reasonable scientific > (let alone mathematical) way of expressing it. > Yes, I get the point of srcu_gp_mutex, which keeps the code from becoming too fragile. But the carefully designed srcu_cb_mutex may leave room for people to have fantasy. My purpose just reminders someone in future it is known and tried. > As you noted, the current state is unsatisfying. There are two different > mechanisms providing mutual exclusion. This code is nowhere near a > fastpath, so the added overhead is absolutely *not* a concern (not yet, > that we know of, anyway), but there are still risks, for example: > > 1. Latent bugs could be introduced that suddenly make their presence > known should the code ever be relying solely on the mutexes for > mutual exclusion. > > 2. People imagine other purposes for either the single workqueue > or the mutexes, and inject bugs based on the fact that their > imagination of the design has diverged from the actual design. > > These risks are not huge, especially compared to other risks in the > code base. For but one example, the code that provides the memory-order > guarantees for synchronize_rcu() are intricate and spread widely over the > code. As they must be in order to make important fastpaths fast, which > is a benefit that is important enough to make taking that risk worthwhile. > As said above, I understood this and srcu_gp_mutex is not my target. (I am sorry that my comment may give wrong impression) > So what are the options here? > > 1. Leave it as it is, with the non-huge risks as called out above. > Due to the test data, it is dim for the outlook of this patch. I am fine with option one. But if necessary, I am also OK with some comment around srcu_cb_mutex, which may save someone's time in future. Thanks, Pingfan > 2. Add comments, as in this commit. One risk here is that someone > adds the concurrency, but fails to remove the comments, which > are now obsolete and confusing. > > 3. Add words describing the current double-provided mutual exclusion > to one of the RCU design documents. > > 4. Leave mainline and -rcu as they are, but add your > multiple-workqueue patch to one of the RCU design documents > covering SRCU so that it doesn't get lost should it ever become > necessary to speed up SRCU grace periods. > > 5. Leave mainline as it is, but add your multiple-workqueue patch to > -rcu with a not-for-mainline branch marking its presence so that > this change can be easily applied later, if needed. (For another > example of this, see the sysidle.2017.05.11a branch, which was > actually in mainline for some time and then later removed because > it was not being used.) > > 6. Make the workqueue supply concurrency, as in your earlier patch. > As noted in the email threads, this adds risk for no known > user-visible benefit. > > 7. Remove the mutexes, and rely on the workqueue for the needed > mutual exclusion. This is unlikely to make the code more > readable, and makes it more difficult to add concurrency if it > is ever needed. > > 8. Make the workqueue supply concurrency, but only when a given > kernel-boot parameter or Kconfig option is supplied. Enable > this concurrency only during selected rcutorture tests. > This has the advantage of actually testing the placement and > usage of the mutexes, but on the other hand it increases test > effort for no known end-user-visible benefit. > > Are there other options? > > Thanx, Paul > > > Some test data to show how dual work_struct affects the SRCU > > performance: > > > > -1. on 144 cpus hpe-dl560gen10 > > modprobe rcutorture torture_type=srcud fwd_progress=144 fwd_progress_holdoff=1 shutdown_secs=36000 stat_interval=60 verbose=1 > > # base line > > [37587.632155] srcud: End-test grace-period state: g28287244 f0x0 total-gps=28287244 > > # two dual work_struct > > [36443.468017] srcud: End-test grace-period state: g29026056 f0x0 total-gps=29026056 > > > > -2. on 256 cpus amd-milan > > modprobe rcutorture torture_type=srcud fwd_progress=256 fwd_progress_holdoff=1 shutdown_secs=36000 stat_interval=60 verbose=1 > > # base line > > [36093.605732] srcud: End-test grace-period state: g10850284 f0x0 total-gps=10850284 > > # two dual work_struct > > [36093.856713] srcud: End-test grace-period state: g10672632 f0x0 total-gps=10672632 > > > > The first test shows that it has about 2.6% improvement, while the > > second test shows it has no significant effect. > > > > [1]: https://lore.kernel.org/rcu/Y5nC2yBhaUYirByo@xxxxxxxxxxxxxxxxxxxxxxxxxx/T/#maf99cf1d92077a1b0b8e5c411c79fa4444576624 > > > > Signed-off-by: Pingfan Liu <kernelfans@xxxxxxxxx> > > Cc: Lai Jiangshan <jiangshanlai@xxxxxxxxx> > > Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx> > > Cc: Frederic Weisbecker <frederic@xxxxxxxxxx> > > Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx> > > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx> > > Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > > Cc: "Zhang, Qiang1" <qiang1.zhang@xxxxxxxxx> > > Cc: Boqun Feng <boqun.feng@xxxxxxxxx> > > Cc: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> > > To: rcu@xxxxxxxxxxxxxxx > > --- > > include/linux/srcutree.h | 4 ++++ > > kernel/rcu/srcutree.c | 5 ++++- > > 2 files changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h > > index 558057b517b7..f199c31493ab 100644 > > --- a/include/linux/srcutree.h > > +++ b/include/linux/srcutree.h > > @@ -66,8 +66,12 @@ struct srcu_struct { > > /* First node at each level. */ > > int srcu_size_state; /* Small-to-big transition state. */ > > struct mutex srcu_cb_mutex; /* Serialize CB preparation. */ > > + /* Not take effect since the only single sdp->work */ > > + /* meets the workqueue non-reentrant condition */ > > spinlock_t __private lock; /* Protect counters and size state. */ > > struct mutex srcu_gp_mutex; /* Serialize GP work. */ > > + /* Not take effect since the only single sdp->work */ > > + /* meets the workqueue non-reentrant condition */ > > unsigned int srcu_idx; /* Current rdr array element. */ > > unsigned long srcu_gp_seq; /* Grace-period seq #. */ > > unsigned long srcu_gp_seq_needed; /* Latest gp_seq needed. */ > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > > index ce39907fa381..8ebb1a240219 100644 > > --- a/kernel/rcu/srcutree.c > > +++ b/kernel/rcu/srcutree.c > > @@ -815,7 +815,10 @@ static void srcu_gp_end(struct srcu_struct *ssp) > > struct srcu_node *snp; > > int ss_state; > > > > - /* Prevent more than one additional grace period. */ > > + /* > > + * Prevent more than one additional grace period. But at present, it does > > + * not take effect. Refer to the note at the definition. > > + * / > > mutex_lock(&ssp->srcu_cb_mutex); > > > > /* End the current grace period. */ > > -- > > 2.31.1 > >