Benchmarking for vhost polling patch

"Razya Ladelsky" <razya@xxxxxxxxxx> · Wed, 29 Oct 2014 14:38:31 +0200 (IST)

Hi Michael,

Following the polling patch thread: http://marc.info/?l=kvm&m=140853271510179&w=2, 
I changed poll_stop_idle to be counted in micro seconds, and carried out 
experiments using varying sizes of this value. The setup for netperf consisted of 
1 vm and 1 vhost , each running on their own dedicated core.

Here are the  numbers for netperf (micro benchmark):

polling|Send |Throughput|Utilization |S. Demand   |vhost|exits|throughput|throughput
mode   |Msg  |          |Send  Recv  |Send  Recv  |util |/sec | /cpu     |   /cpu
       |Size |          |local remote|local remote|     |     |          |% change
       |bytes|10^6bits/s|  %    %    |us/KB us/KB |  %  |     |          |    
-----------------------------------------------------------------------------
NoPolling  64   1054.11   99.97 3.01  7.78  3.74   38.80  92K    7.60
Polling=1  64   1036.67   99.97 2.93  7.90  3.70   53.00  92K    6.78     -10.78
Polling=5  64   1079.27   99.97 3.07  7.59  3.73   83.00  90K    5.90     -22.35
Polling=7  64   1444.90   99.97 3.98  5.67  3.61   95.00  19.5K  7.41      -2.44
Polling=10 64   1521.70   99.97 4.21  5.38  3.63   98.00  8.5K   7.69       1.19
Polling=25 64   1534.24   99.97 4.18  5.34  3.57   99.00  8.5K   7.71       1.51
Polling=50 64   1534.24   99.97 4.18  5.34  3.57   99.00  8.5K   7.71       1.51

NoPolling  128  1577.39   99.97 4.09  5.19  3.40   54.00  113K   10.24 
Polling=1  128  1596.08   99.97 4.22  5.13  3.47   71.00  120K   9.34      -8.88
Polling=5  128  2238.49   99.97 5.45  3.66  3.19   92.00  24K    11.66     13.82
Polling=7  128  2330.97   99.97 5.59  3.51  3.14   95.00  19.5K  11.96     16.70
Polling=10 128  2375.78   99.97 5.69  3.45  3.14   98.00  10K    12.00     17.14
Polling=25 128  2655.01   99.97 2.45  3.09  1.21   99.00  8.5K   13.34     30.25
Polling=50 128  2655.01   99.97 2.45  3.09  1.21   99.00  8.5K   13.34     30.25

NoPolling  25   2558.10   99.97 2.33  3.20  1.20   67.00  120K   15.32 
Polling=1  25   2508.93   99.97 3.13  3.27  1.67   75.00  125K   14.34     -6.41
Polling=5  25   3740.34   99.97 2.70  2.19  0.95   94.00  17K    19.28     25.86
Polling=7  25   3692.69   99.97 2.80  2.22  0.99   97.00  15.5K  18.75     22.37
Polling=10 25   4036.60   99.97 2.69  2.03  0.87   99.00  8.5K   20.29     32.42
Polling=25 25   3998.89   99.97 2.64  2.05  0.87   99.00  8.5K   20.10     31.18
Polling=50 25   3998.89   99.97 2.64  2.05  0.87   99.00  8.5K   20.10     31.18

NoPolling  512  4531.50   99.90 2.75  1.81  0.79   78.00  55K    25.47 
Polling=1  512  4684.19   99.95 2.69  1.75  0.75   83.00  35K    25.60      0.52
Polling=5  512  4932.65   99.75 2.75  1.68  0.74   91.00  12K    25.86      1.52
Polling=7  512  5226.14   99.86 2.80  1.57  0.70   95.00  7.5K   26.82      5.30
Polling=10 512  5464.90   99.60 2.90  1.49  0.70   96.00  8.2K   27.94      9.69
Polling=25 512  5550.44   99.58 2.84  1.47  0.67   99.00  7.5K   27.95      9.73
Polling=50 512  5550.44   99.58 2.84  1.47  0.67   99.00  7.5K   27.95      9.73

As you can see from the last column, polling improves performance in most cases.

I ran memcached (macro benchmark), where (as in the previous benchmark) the vm and 
vhost each get their own dedicated core. I configured memslap with C=128, T=8, as 
this configuration was required to produce enough load to saturate the vm.
I tried several other configurations, but this one produced the maximal 
throughput(for the baseline). 

The numbers for memcached (macro benchmark):

polling     time   TPS     Net    vhost vm   exits  TPS/cpu  TPS/cpu
mode                       rate   util  util /sec             % change
                              %                                   
Disabled    15.9s  125819  91.5   45    99   87K    873.74   
polling=1   15.8s  126820  92.3   60    99   87K    797.61   -8.71
polling=5   12.82  155799  113.4  79    99   25.5K  875.28    0.18
polling=10  11.7s  160639  116.9  83    99   16.3K  882.63    1.02
pollling=15 12.4s  160897  117.2  87    99   15K    865.04   -1.00
polling=100 11.7s  170971  124.4  99    99   30     863.49   -1.17

For memcached TPS/cpu does not show a significant difference in any of the cases. 
However, TPS numbers did improve in up to 35%, which can be useful for under-utilized 
systems which have cpu time to spare for extra throughput. 

If it makes sense to you, I will continue with the other changes requested for 
the patch.

Thank you,
Razya

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html