Re: [PATCH] srcu: Add notes for not-effective mutex

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Sun, 18 Dec 2022 09:53:13 -0800

On Wed, Dec 14, 2022 at 10:24:46PM +0800, Pingfan Liu wrote:
> Capped by the design and implementation of workqueue, the same work can not
> be re-entrant.
> 
> SRCU has only a single work ssp->work, there is no way to occur the
> concurrent scene in its state machine. Hence the mutexes for the
> concurrency in the state machine is unnecessary. On the other hand,
> using two works to utilize the window from the release of srcu_gp_mutex
> to the end of srcu_gp_end() to enhance the parallelism has limited
> effect. [1]
> 
> To save any trying of improving parallelism by using two work_struct,
> put some notes around the srcu_gp_mutex and srcu_cb_mutex. And keep them
> instead of removing them, in case in future, the state machine is moved
> out of the protection of workqueue non-reentrant protection.

Thank you for continuning to push on this!

But let's step back and take a wider view, plus summarize a bit.

First, as we have discussed, it is necessary to make RCU work not just
right now, but also to keep it from becoming too fragile.  This notion of
"fragile" isn't yet something that can be reasonably measured, but one
could unreasonably measure it by either comparing the number of words
of English required to explain why it works on the one hand and making
lots of random changes and seeing what fraction of those changes break
something on the other.  These unreasonable measures are, just like the
name says, unreasonable.  But I hope that they give an intuitive notion
of what the goal must be.  Again, I don't know of a reasonable scientific
(let alone mathematical) way of expressing it.

As you noted, the current state is unsatisfying.  There are two different
mechanisms providing mutual exclusion.  This code is nowhere near a
fastpath, so the added overhead is absolutely *not* a concern (not yet,
that we know of, anyway), but there are still risks, for example:

1.	Latent bugs could be introduced that suddenly make their presence
	known should the code ever be relying solely on the mutexes for
	mutual exclusion.

2.	People imagine other purposes for either the single workqueue
	or the mutexes, and inject bugs based on the fact that their
	imagination of the design has diverged from the actual design.

These risks are not huge, especially compared to other risks in the
code base.  For but one example, the code that provides the memory-order
guarantees for synchronize_rcu() are intricate and spread widely over the
code.  As they must be in order to make important fastpaths fast, which
is a benefit that is important enough to make taking that risk worthwhile.

So what are the options here?

1.	Leave it as it is, with the non-huge risks as called out above.

2.	Add comments, as in this commit.  One risk here is that someone
	adds the concurrency, but fails to remove the comments, which
	are now obsolete and confusing.

3.	Add words describing the current double-provided mutual exclusion
	to one of the RCU design documents.

4.	Leave mainline and -rcu as they are, but add your
	multiple-workqueue patch to one of the RCU design documents
	covering SRCU so that it doesn't get lost should it ever become
	necessary to speed up SRCU grace periods.

5.	Leave mainline as it is, but add your multiple-workqueue patch to
	-rcu with a not-for-mainline branch marking its presence so that
	this change can be easily applied later, if needed.  (For another
	example of this, see the sysidle.2017.05.11a branch, which was
	actually in mainline for some time and then later removed because
	it was not being used.)

6.	Make the workqueue supply concurrency, as in your earlier patch.
	As noted in the email threads, this adds risk for no known
	user-visible benefit.

7.	Remove the mutexes, and rely on the workqueue for the needed
	mutual exclusion.  This is unlikely to make the code more
	readable, and makes it more difficult to add concurrency if it
	is ever needed.

8.	Make the workqueue supply concurrency, but only when a given
	kernel-boot parameter or Kconfig option is supplied.  Enable
	this concurrency only during selected rcutorture tests.
	This has the advantage of actually testing the placement and
	usage of the mutexes, but on the other hand it increases test
	effort for no known end-user-visible benefit.

Are there other options?

							Thanx, Paul

> Some test data to show how dual work_struct affects the SRCU
> performance:
> 
> -1. on 144 cpus hpe-dl560gen10
>   modprobe rcutorture   torture_type=srcud  fwd_progress=144 fwd_progress_holdoff=1 shutdown_secs=36000 stat_interval=60 verbose=1
>   # base line
>   [37587.632155] srcud:  End-test grace-period state: g28287244 f0x0 total-gps=28287244
>   # two dual work_struct
>   [36443.468017] srcud:  End-test grace-period state: g29026056 f0x0 total-gps=29026056
> 
> -2. on 256 cpus amd-milan
>   modprobe rcutorture   torture_type=srcud  fwd_progress=256 fwd_progress_holdoff=1 shutdown_secs=36000 stat_interval=60 verbose=1
>   # base line
>   [36093.605732] srcud:  End-test grace-period state: g10850284 f0x0 total-gps=10850284
>   # two dual work_struct
>   [36093.856713] srcud:  End-test grace-period state: g10672632 f0x0 total-gps=10672632
> 
> The first test shows that it has about 2.6% improvement, while the
> second test shows it has no significant effect.
> 
> [1]: https://lore.kernel.org/rcu/Y5nC2yBhaUYirByo@xxxxxxxxxxxxxxxxxxxxxxxxxx/T/#maf99cf1d92077a1b0b8e5c411c79fa4444576624
> 
> Signed-off-by: Pingfan Liu <kernelfans@xxxxxxxxx>
> Cc: Lai Jiangshan <jiangshanlai@xxxxxxxxx>
> Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
> Cc: Frederic Weisbecker <frederic@xxxxxxxxxx>
> Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
> Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
> Cc: "Zhang, Qiang1" <qiang1.zhang@xxxxxxxxx>
> Cc: Boqun Feng <boqun.feng@xxxxxxxxx>
> Cc: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx>
> To: rcu@xxxxxxxxxxxxxxx
> ---
>  include/linux/srcutree.h | 4 ++++
>  kernel/rcu/srcutree.c    | 5 ++++-
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
> index 558057b517b7..f199c31493ab 100644
> --- a/include/linux/srcutree.h
> +++ b/include/linux/srcutree.h
> @@ -66,8 +66,12 @@ struct srcu_struct {
>  						/* First node at each level. */
>  	int srcu_size_state;			/* Small-to-big transition state. */
>  	struct mutex srcu_cb_mutex;		/* Serialize CB preparation. */
> +						/* Not take effect since the only single sdp->work */
> +						/* meets the workqueue non-reentrant condition */
>  	spinlock_t __private lock;		/* Protect counters and size state. */
>  	struct mutex srcu_gp_mutex;		/* Serialize GP work. */
> +						/* Not take effect since the only single sdp->work */
> +						/* meets the workqueue non-reentrant condition */
>  	unsigned int srcu_idx;			/* Current rdr array element. */
>  	unsigned long srcu_gp_seq;		/* Grace-period seq #. */
>  	unsigned long srcu_gp_seq_needed;	/* Latest gp_seq needed. */
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index ce39907fa381..8ebb1a240219 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -815,7 +815,10 @@ static void srcu_gp_end(struct srcu_struct *ssp)
>  	struct srcu_node *snp;
>  	int ss_state;
>  
> -	/* Prevent more than one additional grace period. */
> +	/*
> +	 * Prevent more than one additional grace period. But at present, it does
> +	 * not take effect. Refer to the note at the definition.
> +	 * /
>  	mutex_lock(&ssp->srcu_cb_mutex);
>  
>  	/* End the current grace period. */
> -- 
> 2.31.1
>