Re: Question about cacheline bounching with percpu-rwsem and rcu-sync

Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> · Fri, 31 May 2019 10:43:27 -0400

On Fri, May 31, 2019 at 9:52 AM Paul E. McKenney <paulmck@xxxxxxxxxxxxx> wrote:
>
> On Fri, May 31, 2019 at 09:10:16AM -0400, Joel Fernandes wrote:
> > Hi,
> > As per the documentation for rationale of percpu-rwsem, the Documentation says:
> >
> > The problem with traditional read-write semaphores is that when multiple
> > cores take the lock for reading, the cache line containing the semaphore
> > is bouncing between L1 caches of the cores, causing performance
> > degradation.
> >
> > However, it appears to me that the struct percpu_rwsem "rss" element
> > which is used by the RCU-sync is not a per-cpu element. So even in the
> > fastpath case (only readers and no writers), the cacheline containing
> > rss is shared and will bounce by multiple CPUs. For that matter, even
> > the cacheline containing the percpu_rw_semaphore itself will be bounce
> > among multiple reader CPUs.
> >
> > So how does percpu-rwsem eliminate cache line bouncing in the common
> > case. Could you let me know what I am missing?
> >
> > Thanks a lot.
>
> The accesses are loads, except for the __this_cpu_inc(), which updates
> a per-CPU variable.  The locations loaded will replicate across the
> CPUs' caches and the per-CPU variables are private to each CPU.  Hence
> no cacheline bouncing.

Makes sense, thanks for the answer!

>
> Either way, it would be good for you to just try it.  Create a kernel
> module or similar than hammers on percpu_down_read() and percpu_up_read(),
> and empirically check the scalability on a largish system.  Then compare
> this to down_read() and up_read()

Will do! thanks.