Re: [RFC PATCH 1/1] NUMA aware scheduling per cpu vhost thread

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/23/2012 11:32 AM, Thomas Lendacky wrote:
I ran a series of TCP_RR, UDP_RR, TCP_STREAM and TCP_MAERTS tests
against the recent vhost patches. For simplicity, the patches
submitted by Anthony that increase the number of threads per vhost
instance I will call multi-worker and the patches submitted by Shirley
that provide a vhost thread per cpu I will call per-cpu.

Lots of nice data there - kudos.

Quick description of the tests:
   TCP_RR and UDP_RR using 256 byte request/response size in 1, 10, 30
   and 60 instances

There is a point, not quite sure where, when aggregate, synchronous single-transaction netperf tests become as much a context switching test as a networking test. That is why netperf RR has support for the "burst mode" to have more than one transaction in flight at one time:

http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Using-_002d_002denable_002dburst

When looking to measure packet/transaction per second scaling I've taken to finding the peak for a single stream by running up the burst size, (TCP_NODELAY set) and then running 1, 2, 4 etc of those streams. With the occasional ethtool -S audit to make sure that each TCP_RR transaction is indeed a discrete pair of TCP segments...

In addition to avoiding concerns about becoming a context switching exercise, the reduction in netperf instances means less chance for skew error on startup and shutdown. To address that I've somewhat recently taken to using demo mode in netperf and then post-processing the results through rrdtool:

http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Using-_002d_002denable_002ddemo

I have a "one to many" script for that under:

http://www.netperf.org/svn/netperf2/trunk/doc/examples/runemomniaggdemo.sh

which is then post-processed via some stone knives and bearskins:
http://www.netperf.org/svn/netperf2/trunk/doc/examples/post_proc.sh
http://www.netperf.org/svn/netperf2/trunk/doc/examples/vrules.awk
http://www.netperf.org/svn/netperf2/trunk/doc/examples/mins_maxes.awk

I've also used that basic idea in some many to many tests involving 512 concurrent netperf instances but that script isn't up on netperf.org.

   TCP_STREAM and TCP_MAERTS using 256, 1K, 4K and 16K message sizes
   and 1 and 4 instances

Netperf's own documentation and output is probably not good on this point (feel free to loose petards, though some instances may be cast in stone) but those aren't really message sizes. They are simply the quantity of data netperf is presenting to the transport in any one send call. They are send sizes.

   Remote host to VM using 1, 4, 12 and 24 VMs (2 vCPUs) with the tests
   running between an external host and each VM.

I suppose it is implicit, and I'm just being pedantic/paranoid but you are confident of the limits of the external host?

   Local VM to VM using 2, 4, 12 and 24 VMs (2 vCPUs) with the tests
   running between VM pairs on the same host (no TCP_MAERTS done in
   this situation).

For TCP_RR and UDP_RR tests I report the transaction rate as the
score and the transaction rate / KVMhost CPU% as the efficiency.

For TCP_STREAM and TCP_MAERTS tests I report the throughput in Mbps
as the score and the throughput / KVMhost CPU% as the efficiency.

The KVM host machine is a nehalem-based 2-socket, 4-cores/socket
system (E5530 @ 2.40GHz) with hyperthreading disabled and an Intel
10GbE single port network adapter.

There's a lot of data and I hope this is the clearest way to report
it.  The remote host to VM results are first followed by the local
VM to VM results.

Looks reasonable as far as presentation goes. Might have included a summary table of the various peaks:

TCP_RR Remote Host to VM:
        Inst     -   Base    -  -Multi-Worker- -  Per-CPU  -
    VMs  /VM    Score   Eff    Score   Eff    Score   Eff
      1      60 117,448 3,929  148,330 3,616  137,996 3,898
      4      60 308,838 3,555  170,486 1,738  285,073 2,988
     12      60 156,868 1,574  152,205 1,527  223,701 2,250
     24      60 144,684 1,457  146,788 1,468  240,963 2,513

Given the KVM host machine is 8 cores with hyperthreading disabled, I might have included a data point at 8 VMs even if they were 2 vCPU VMs, but that is just my gut talking. Certainly looking at the summary table I'm wondering where between 4 and 12 VMs the curve starts its downward trend. Does 12 and 24, 2vCPU VMs force moving around more than say 16 or 32 would?

happy benchmarking,

rick jones



Remote Host to VM:
  Host to 1 VM
                 -   Base    -  -Multi-Worker- -  Per-CPU  -
   Test     Inst   Score   Eff    Score   Eff    Score   Eff
   TCP_RR      1   9,587   984    9,725 1,145    9,252 1,041
              10  63,919 3,095   51,841 2,415   55,226 2,884
              30  85,646 3,288  127,277 3,242  145,644 4,092
              60 117,448 3,929  148,330 3,616  137,996 3,898

   UDP_RR      1  10,815 1,174   10,125 1,255    7,913 1,150
              10  53,989 3,082   59,590 2,875   52,353 3,328
              30  91,484 4,115   95,312 3,042  110,715 3,659
              60 107,466 4,689  173,443 4,351  158,141 4,235

   TCP_STREAM
          256  1   2,724   140    2,450   131    2,681   150
               4   5,027   137    4,147   146    3,998   117

         1024  1   5,602   235    4,623   169    5,425   238
               4   5,987   212    5,991   133    6,827   175

         4096  1   6,202   256    6,753   211    7,247   279
               4   4,996   192    5,771   159    7,124   202

        16384  1   6,258   259    7,211   214    8,453   308
               4   4,591   179    5,788   181    6,925   217

   TCP_MAERTS
          256  1   1,951    85    1,871    89    1,899    97
               4   4,757   129    4,102   140    4,279   116

         1024  1   7,479   381    6,970   371    7,374   427
               4   8,931   385    6,612   258    8,731   417

         4096  1   9,276   464    9,296   456    9,131   510
               4   9,381   452    9,032   367    9,338   446

        16384  1   9,153   496    8,817   589    9,238   516
               4   9,358   478    9,006   367    9,350   462

  Host to 1 VM (VM pinned to a socket)
                 -   Base    -  -Multi-Worker- -  Per-CPU  -
   Test     Inst   Score   Eff    Score   Eff    Score   Eff
   TCP_RR      1   9,992 1,019    9,899   917    8,963   899
              10  60,731 3,236   60,015 2,444   55,860 3,059
              30 127,375 4,042  146,571 3,922  163,806 4,389
              60 173,021 4,972  149,549 4,662  161,397 4,330

   UDP_RR      1  10,854 1,253    7,983 1,120    7,647 1,206
              10  68,128 3,804   64,335 4,067   53,343 3,233
              30  92,456 3,994  112,101 4,219  111,610 3,598
              60 135,741 4,590  184,441 4,422  184,527 4,546

   TCP_STREAM
          256  1   2,564   146    2,530   147    2,497   150
               4   4,757   139    4,300   127    4,245   124

         1024  1   4,700   209    6,062   323    5,627   247
               4   6,828   214    7,125   153    6,561   172

         4096  1   6,676   281    7,672   286    7,760   290
               4   6,258   236    6,410   171    7,354   225

        16384  1   6,712   289    8,217   297    8,457   322
               4   5,764   235    6,285   200    7,554   245

   TCP_MAERTS
          256  1   1,673    82    1,444    71    1,756    88
               4   6,385   175    5,671   155    5,685   153

         1024  1   7,500   427    6,884   414    7,640   429
               4   9,310   444    8,659   496    8,200   350

         4096  1   8,427   477    9,201   515    8,825   422
               4   9,372   478    9,184   394    9,391   446

        16384  1   8,840   500    9,205   555    9,239   482
               4   9,379   495    9,079   385    9,389   472

  Host to 4 VMs
                 -   Base    -  -Multi-Worker- -  Per-CPU  -
   Test     Inst   Score   Eff    Score   Eff    Score   Eff
   TCP_RR      1  38,635   949   34,063   843   35,432   897
              10 193,703 2,604  157,699 1,841  180,323 2,858
              30 279,736 3,301  170,343 1,739  269,827 2,875
              60 308,838 3,555  170,486 1,738  285,073 2,988

   UDP_RR      1  42,209 1,136   36,035   904   36,974   975
              10 177,286 2,616  166,999 2,043  178,470 2,466
              30 296,415 3,731  221,738 2,488  260,630 2,966
              60 353,784 4,179  209,489 2,152  306,792 3,440

   TCP_STREAM
          256  1   8,409   113    7,517   101    7,178   115
               4   8,963    93    7,825    80    8,606    91

         1024  1   9,382   119   10,223   192    9,314   128
               4   9,233   101    9,085   110    8,585   105

         4096  1   9,391   124    9,393   125    9,300   140
               4   9,303   103    9,151   102    8,601   106

        16384  1   9,395   121    8,715   128    9,378   135
               4   9,322   105    9,135   101    8,691   121

   TCP_MAERTS
          256  1   8,629   125    7,045   112    7,559   109
               4   9,389   145    7,091    80    9,335   156

         1024  1   9,385   201    9,349   148    9,320   248
               4   9,392   154    9,340   148    9,390   226

         4096  1   9,387   239    9,339   151    9,379   291
               4   9,392   167    9,389   124    9,390   259

        16384  1   9,374   236    9,366   150    9,391   317
               4   9,365   167    9,394   123    9,390   284

  Host to 12 VMs
                 -   Base    -  -Multi-Worker- -  Per-CPU  -
   Test     Inst   Score   Eff    Score   Eff    Score   Eff
   TCP_RR      1  79,628   928   85,717   944   72,760   885
              10 106,348 1,067   94,032   944  164,548 2,017
              30 131,313 1,318  116,431 1,168  206,560 2,367
              60 156,868 1,574  152,205 1,527  223,701 2,250

   UDP_RR      1  90,762 1,059   93,904 1,037   75,512   919
              10 149,381 1,499  113,254 1,136  194,153 1,951
              30 177,803 1,783  132,818 1,333  235,682 2,370
              60 201,833 2,025  154,871 1,554  258,133 2,595

   TCP_STREAM
          256  1   8,549    86    7,173    72    8,407    85
               4   8,910    89    8,693    87    8,768    88

         1024  1   9,397    95    9,371    94    9,376    95
               4   9,289    93    9,268   100    8,898    92

         4096  1   9,399    95    9,415    95    9,401    97
               4   9,336    94    9,319    94    8,938    94

        16384  1   9,405    95    9,402    96    9,397   102
               4   9,366    94    9,345    94    8,890    94

   TCP_MAERTS
          256  1   4,646    49    2,273    23    9,232   135
               4   9,393   107    8,019    81    9,414   134

         1024  1   9,393   115    9,403   104    9,399   178
               4   9,406   110    9,383    98    9,392   157

         4096  1   9,393   114    9,409   104    9,388   202
               4   9,388   110    9,387    98    9,382   181

        16384  1   9,396   114    9,391   104    9,394   221
               4   9,411   110    9,384    98    9,391   192

  Host to 24 VMs
                 -   Base    -  -Multi-Worker- -  Per-CPU  -
   Test     Inst   Score   Eff    Score   Eff    Score   Eff
   TCP_RR      1 110,139 1,118  101,765 1,033   79,189   805
              10  94,757   948   90,872   915  156,821 1,581
              30 119,904 1,199  120,728 1,207  214,151 2,211
              60 144,684 1,457  146,788 1,468  240,963 2,513

   UDP_RR      1 129,655 1,316  120,071 1,201   91,208   914
              10 119,204 1,201  104,645 1,046  208,432 2,340
              30 158,887 1,601  136,629 1,366  249,329 2,517
              60 179,365 1,794  159,883 1,610  259,018 2,651

   TCP_STREAM
          256  1   5,899    59    4,258    44    8,071    82
               4   8,739    89    8,195    83    7,934    82

         1024  1   8,477    86    7,498    76    9,268    93
               4   9,205    93    9,171    94    8,159    84

         4096  1   9,334    96    8,992    92    9,324    97
               4   9,255    95    9,221    92    8,237    85

        16384  1   9,373    96    9,356    95    9,311    96
               4   9,283    94    9,275    93    8,317    86

   TCP_MAERTS
          256  1     739     7      770     8    9,186   129
               4   7,804    79    7,573    76    9,253   122

         1024  1   1,763    18    1,759    18    9,287   146
               4   9,204    99    9,166    93    9,389   155

         4096  1   3,430    35    3,403    35    9,348   161
               4   9,372   100    9,315    95    9,385   151

        16384  1   9,309   102    9,306    97    9,353   175
               4   9,378   100    9,392    96    9,377   159



Local VM to VM:

  1 VM to 1 VM
                 -   Base    -  -Multi-Worker- -  Per-CPU  -
   Test     Inst   Score   Eff    Score   Eff    Score   Eff
   TCP_RR      1   7,422   506    7,698   462    6,281   450
              10  49,662 1,362   47,553 1,205   43,258 1,270
              30  91,657 1,538   99,319 1,471   89,478 1,499
              60 106,168 1,658  106,430 1,503   99,205 1,576

   UDP_RR      1   8,414   552    8,532   528    6,976   499
              10  58,359 1,645   55,283 1,398   48,094 1,457
              30  91,046 1,736  109,403 1,721   92,109 1,715
              60 128,835 2,021  130,382 1,807  118,563 1,853

   TCP_STREAM
          256  1   2,029    60    1,923    54    1,998    64
               4   3,861    66    3,445    53    2,914    54

         1024  1   7,374   205    6,465   174    5,704   165
               4   8,474   196    7,541   161    6,274   156

         4096  1  12,825   295   11,921   275   10,262   262
               4  12,639   253   13,395   260   11,451   264

        16384  1  14,576   331   14,141   291   11,925   305
               4  16,016   327   14,210   274   13,656   308


  1 VM to 1 VM (each VM pinned to a socket)
                 -   Base    -  -Multi-Worker- -  Per-CPU  -
   Test     Inst   Score   Eff    Score   Eff    Score   Eff
   TCP_RR      1   7,145   489    7,840   477    5,965   467
              10  51,016 1,406   47,881 1,223   45,232 1,288
              30  92,785 1,580  103,453 1,512   91,437 1,523
              60 120,160 1,817  115,058 1,595  102,734 1,611

   UDP_RR      1   7,908   547    8,704   541    6,552   528
              10  59,807 1,653   56,598 1,435   50,524 1,488
              30  90,302 1,738  113,861 1,765   94,640 1,720
              60 141,684 2,196  141,866 1,919  125,334 1,917

   TCP_STREAM
          256  1   2,210    64    1,291    32    2,069    64
               4   3,993    64    3,441    52    2,780    50

         1024  1   8,106   217    7,571   198    5,709   165
               4   8,471   206    8,756   174    6,531   157

         4096  1  15,360   350   13,825   303   10,717   271
               4  14,671   330   12,604   263   11,266   258

        16384  1  18,284   395   16,305   337   13,185   317
               4  15,451   331   12,438   247   14,699   316


  2 VMs to 2 VMs (4 VMs total)
                 -   Base    -  -Multi-Worker- -  Per-CPU  -
   Test     Inst   Score   Eff    Score   Eff    Score   Eff
   TCP_RR      1  15,498   491   16,518   460   13,008   441
              10  71,425   983   79,711 1,063   85,087 1,037
              30 102,132 1,436   82,191 1,145  100,504 1,076
              60 127,670 1,608   96,815 1,262  104,694 1,119

   UDP_RR      1  17,091   548   18,214   538   14,780   492
              10  77,682 1,129   87,523 1,235   86,755 1,165
              30 131,830 1,826   92,844 1,327  111,839 1,232
              60 145,688 1,952  111,315 1,520  116,358 1,296

   TCP_STREAM
          256  1   5,085    72    3,900    50    2,430    38
               4   6,622    70    4,337    48    5,032    58

         1024  1  15,262   206   15,022   195    7,000   115
               4  14,205   174   15,288   174   11,030   148

         4096  1  15,020   197   21,694   261   13,583   198
               4  16,818   205   16,076   195   17,175   238

        16384  1  19,671   261   23,699   290   22,396   306
               4  18,648   229   17,901   218   17,122   251

  6 VMs to 6 VMs (12 VMs total)
                 -   Base    -  -Multi-Worker- -  Per-CPU  -
   Test     Inst   Score   Eff    Score   Eff    Score   Eff
   TCP_RR      1  30,242   400   32,281   390   27,737   401
              10  73,461   783   61,856   644   93,259 1,000
              30  98,638 1,034   81,799   844  107,022 1,121
              60 114,238 1,200   91,772   944  110,839 1,152

   UDP_RR      1  33,017   438   35,540   429   30,022   438
              10  84,676   910   67,838   711  112,339 1,220
              30 110,799 1,156   90,555   932  128,928 1,357
              60 129,679 1,354  100,715 1,033  136,503 1,429

   TCP_STREAM
          256  1   6,947    72    5,380    56    6,138    72
               4   8,400    85    7,660    77    8,893    89

         1024  1  13,698   146   10,307   108   13,023   158
               4  15,391   157   13,242   135   17,264   182

         4096  1  18,928   202   14,580   154   16,970   189
               4  18,826   191   17,262   175   19,558   212

        16384  1  22,176   234   17,716   187   21,245   243
               4  21,306   215   20,332   206   18,353   227

  12 VMs to 12 VMs (24 VMs total)
                 -   Base    -  -Multi-Worker- -  Per-CPU  -
   Test     Inst   Score   Eff    Score   Eff    Score   Eff
   TCP_RR      1  72,926   731   67,338   675   32,662   387
              10  62,441   625   59,277   594   87,286   891
              30  72,761   728   67,760   679  102,549 1,041
              60  78,087   782   74,654   748  100,687 1,016

   UDP_RR      1  82,662   829   80,875   810   34,915   421
              10  71,424   716   67,754   679  111,753 1,147
              30  79,495   796   75,512   756  134,576 1,372
              60  83,339   835   77,523   778  137,058 1,390

   TCP_STREAM
          256  1   2,870    29    2,631    26    7,907    80
               4   8,424    84    8,026    80    8,929    90

         1024  1   3,674    37    3,121    31   15,644   164
               4  14,256   143   13,342   134   16,116   168

         4096  1   5,068    51    4,366    44   16,179   168
               4  17,015   171   16,321   164   17,940   186

        16384  1   9,768    98    9,025    90   19,233   203
               4  18,981   190   18,202   183   18,964   203


On Thursday, March 22, 2012 05:16:30 PM Shirley Ma wrote:
Resubmit it with the right format.

Signed-off-by: Shirley Ma<xma@xxxxxxxxxx>
Signed-off-by: Krishna Kumar<krkumar2@xxxxxxxxxx>
Tested-by: Tom Lendacky<toml@xxxxxxxxxx>
---

  drivers/vhost/net.c                  |   26 ++-
  drivers/vhost/vhost.c                |  300
++++++++++++++++++++++++---------- drivers/vhost/vhost.h                |
16 ++-
  3 files changed, 243 insertions(+), 103 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux