On Fri, May 31, 2019 at 9:45 AM Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > On 05/31, Joel Fernandes wrote: > > > > The problem with traditional read-write semaphores is that when multiple > > cores take the lock for reading, the cache line containing the semaphore > > is bouncing between L1 caches of the cores, causing performance > > degradation. > > > > However, it appears to me that the struct percpu_rwsem "rss" element > > which is used by the RCU-sync is not a per-cpu element. So even in the > > fastpath case (only readers and no writers), the cacheline containing > > rss is shared and will bounce by multiple CPUs. For that matter, even > > the cacheline containing the percpu_rw_semaphore itself will be bounce > > among multiple reader CPUs. > > The readers won't modify this memory? read_lock/unlock will only update > the per-cpu counter, ->read_count. Makes sense, I was confusing cache misses for cache bouncing. Thanks for clarification!