On Tue, Dec 20, 2022 at 12:00:58PM -0500, Mathieu Desnoyers wrote: > On 2022-12-19 20:04, Joel Fernandes wrote: > The main benefit I expect is improved performance of the grace period > implementation in common cases where there are few or no readers present, > especially on machines with many cpus. > > It allows scanning both periods (0/1) for each cpu within the same pass, > therefore loading both period's unlock counters sitting in the same cache > line at once (improved locality), and then loading both period's lock > counters, also sitting in the same cache line. > > It also allows skipping the period flip entirely if there are no readers > present, which is an -arguably- tiny performance improvement as well. I would indeed expect performance improvement if there are no readers in the active period/idx but if there are, it's a performance penalty due to the extra scans. So my mean questions are: * Is the no-present-readers the most likely case? I guess it depends on the ssp. * Does the SRCU update side deserve to be optimized with added code (because we are not debating about removing the flip, rather about adding a fast-path and keep the flip as a slow-path) * The SRCU machinery is already quite complicated. Look how we little things lock ourselves in for days doing our exegesis of SRCU state machine. And halfway through it we are still debating some ordering. Is it worth adding a new path there? Thanks.