Hi, As per the documentation for rationale of percpu-rwsem, the Documentation says: The problem with traditional read-write semaphores is that when multiple cores take the lock for reading, the cache line containing the semaphore is bouncing between L1 caches of the cores, causing performance degradation. However, it appears to me that the struct percpu_rwsem "rss" element which is used by the RCU-sync is not a per-cpu element. So even in the fastpath case (only readers and no writers), the cacheline containing rss is shared and will bounce by multiple CPUs. For that matter, even the cacheline containing the percpu_rw_semaphore itself will be bounce among multiple reader CPUs. So how does percpu-rwsem eliminate cache line bouncing in the common case. Could you let me know what I am missing? Thanks a lot.