Re: [RFC 0/2] srcu: Remove pre-flip memory barrier

Frederic Weisbecker <frederic@xxxxxxxxxx> · Wed, 21 Dec 2022 01:07:15 +0100

On Tue, Dec 20, 2022 at 12:00:58PM -0500, Mathieu Desnoyers wrote:
> On 2022-12-19 20:04, Joel Fernandes wrote:
> The main benefit I expect is improved performance of the grace period
> implementation in common cases where there are few or no readers present,
> especially on machines with many cpus.
> 
> It allows scanning both periods (0/1) for each cpu within the same pass,
> therefore loading both period's unlock counters sitting in the same cache
> line at once (improved locality), and then loading both period's lock
> counters, also sitting in the same cache line.
> 
> It also allows skipping the period flip entirely if there are no readers
> present, which is an -arguably- tiny performance improvement as well.

I would indeed expect performance improvement if there are no readers in the
active period/idx but if there are, it's a performance penalty due to the extra
scans.

So my mean questions are:

* Is the no-present-readers the most likely case? I guess it depends on the ssp.

* Does the SRCU update side deserve to be optimized with added code (because
  we are not debating about removing the flip, rather about adding a fast-path
  and keep the flip as a slow-path)

* The SRCU machinery is already quite complicated. Look how we little things lock
  ourselves in for days doing our exegesis of SRCU state machine. And halfway
  through it we are still debating some ordering. Is it worth adding a new path there?

Thanks.