On Fri, May 31, 2019 at 10:43 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: [snip] > > > > Either way, it would be good for you to just try it. Create a kernel > > module or similar than hammers on percpu_down_read() and percpu_up_read(), > > and empirically check the scalability on a largish system. Then compare > > this to down_read() and up_read() > > Will do! thanks. I created a test for this and the results are quite amazing just stressed read lock/unlock for rwsem vs percpu-rwsem. The test is conducted on a dual socket Intel x86_64 machine with 14 cores each socket. Test runs 10,000,000 loops of rwsem vs percpu-rwsem: https://github.com/joelagnel/linux-kernel/commit/8fe968116bd887592301179a53b7b3200db84424 Graphs/Results here: https://docs.google.com/spreadsheets/d/1cbVLNK8tzTZNTr-EDGDC0T0cnFCdFK3wg2Foj5-Ll9s/edit?usp=sharing The completion time of the test goes up somewhat exponentially with the number of threads, for the rwsem case, where as for percpu-rwsem it is the same. I could add this data to some of the documentation as well. Thanks! - Joel