Re: [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin

Waiman Long <waiman.long@xxxxxx> · Mon, 04 Aug 2014 14:07:48 -0400

On 08/04/2014 12:25 AM, Davidlohr Bueso wrote:
On Sun, 2014-08-03 at 22:36 -0400, Waiman Long wrote:
This patch set improves upon the rwsem optimistic spinning patch set
from Davidlohr to enable better performing rwsem and more aggressive
use of optimistic spinning.

By using a microbenchmark running 1 million lock-unlock operations per
thread on a 4-socket 40-core Westmere-EX x86-64 test machine running
3.16-rc7 based kernels, the following table shows the execution times
with 2/10 threads running on different CPUs on the same socket where
load is the number of pause instructions in the critical section:

   lock/r:w ratio # of threads	Load:Execution Time (ms)
   -------------- ------------	------------------------
   mutex		      2		1:530.7, 5:406.0, 10:472.7
   mutex		     10		1:1848 , 5:2046 , 10:4394

Before patch:
   rwsem/0:1	      2		1:339.4, 5:368.9, 10:394.0
   rwsem/1:1	      2		1:2915 , 5:2621 , 10:2764
   rwsem/10:1	      2		1:891.2, 5:779.2, 10:827.2
   rwsem/0:1	     10		1:5618 , 5:5722 , 10:5683
   rwsem/1:1	     10		1:14562, 5:14561, 10:14770
   rwsem/10:1	     10		1:5914 , 5:5971 , 10:5912

After patch:
   rwsem/0:1	     2		1:161.1, 5:244.4, 10:271.4
   rwsem/1:1	     2		1:188.8, 5:212.4, 10:312.9
   rwsem/10:1	     2		1:168.8, 5:179.5, 10:209.8
   rwsem/0:1	    10		1:1306 , 5:1733 , 10:1998
   rwsem/1:1	    10		1:1512 , 5:1602 , 10:2093
   rwsem/10:1	    10		1:1267 , 5:1458 , 10:2233

% Change:
   rwsem/0:1	     2		1:-52.5%, 5:-33.7%, 10:-31.1%
   rwsem/1:1	     2		1:-93.5%, 5:-91.9%, 10:-88.7%
   rwsem/10:1	     2		1:-81.1%, 5:-77.0%, 10:-74.6%
   rwsem/0:1	    10		1:-76.8%, 5:-69.7%, 10:-64.8%
   rwsem/1:1	    10		1:-89.6%, 5:-89.0%, 10:-85.8%
   rwsem/10:1	    10		1:-78.6%, 5:-75.6%, 10:-62.2%
So at a very low level you see nicer results, which aren't really
translating to much of a significant impact at a higher level (aim7).

I was using a 4-socket system for testing. I believe the performance 
gain will be higher on larger machine. I will run some tests on those 
larger machine as well.
It can be seen that there is dramatic reduction in the execution
times. The new rwsem is now even faster than mutex whether it is all
writers or a mixture of writers and readers.

Running the AIM7 benchmarks on the same 40-core system (HT off),
the performance improvements on some of the workloads were as follows:

       Workload	     Before Patch	After Patch	% Change
       --------	     ------------	-----------	--------
   custom (200-1000)	446135		  477404	 +7.0%
   custom (1100-2000)	449665		  484734	 +7.8%
   high_systime		152437		  154217	 +1.2%
    (200-1000)
   high_systime		269695		  278942	 +3.4%
    (1100-2000)
I worry about complicating rwsems even _more_ than they are, specially
for such a marginal gain. You might want to try other workloads -- ie:
postgresql (pgbench), I normally get pretty useful data when dealing
with rwsems.

Thank for the info. I will try running pgbench as well.

-Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html