Question about cacheline bounching with percpu-rwsem and rcu-sync

Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> · Fri, 31 May 2019 09:10:16 -0400

Hi,
As per the documentation for rationale of percpu-rwsem, the Documentation says:

The problem with traditional read-write semaphores is that when multiple
cores take the lock for reading, the cache line containing the semaphore
is bouncing between L1 caches of the cores, causing performance
degradation.

However, it appears to me that the struct percpu_rwsem "rss" element
which is used by the RCU-sync is not a per-cpu element. So even in the
fastpath case (only readers and no writers), the cacheline containing
rss is shared and will bounce by multiple CPUs. For that matter, even
the cacheline containing the percpu_rw_semaphore itself will be bounce
among multiple reader CPUs.

So how does percpu-rwsem eliminate cache line bouncing in the common
case. Could you let me know what I am missing?

Thanks a lot.