Krishna Kumar2/India/IBM wrote on 10/14/2010 02:34:01 PM: > void vhost_poll_queue(struct vhost_poll *poll) > { > struct vhost_virtqueue *vq = vhost_find_vq(poll); > > vhost_work_queue(vq, &poll->work); > } > > Since poll batches packets, find_vq does not seem to add much > to the CPU utilization (or BW). I am sure that code can be > optimized much better. > > The results I sent in my last mail were without your use_mm > patch, and the only tuning was to make vhost threads run on > only cpus 0-3 (though the performance is good even without > that). I will test it later today with the use_mm patch too. There's a significant reduction in CPU/SD utilization with your patch. Following is the performance of ORG vs MQ+mm patch: _________________________________________________ Org vs MQ+mm patch txq=2 # BW% CPU/RCPU% SD/RSD% _________________________________________________ 1 2.26 -1.16 .27 -20.00 0 2 35.07 29.90 21.81 0 -11.11 4 55.03 84.57 37.66 26.92 -4.62 8 73.16 118.69 49.21 45.63 -.46 16 77.43 98.81 47.89 24.07 -7.80 24 71.59 105.18 48.44 62.84 18.18 32 70.91 102.38 47.15 49.22 8.54 40 63.26 90.58 41.00 85.27 37.33 48 45.25 45.99 11.23 14.31 -12.91 64 42.78 41.82 5.50 .43 -25.12 80 31.40 7.31 -18.69 15.78 -11.93 96 27.60 7.79 -18.54 17.39 -10.98 128 23.46 -11.89 -34.41 -.41 -25.53 _________________________________________________ BW: 40.2 CPU/RCPU: 29.9,-2.2 SD/RSD: 12.0,-15.6 Following is the performance of MQ vs MQ+mm patch: _____________________________________________________ MQ vs MQ+mm patch # BW% CPU% RCPU% SD% RSD% _____________________________________________________ 1 4.98 -.58 .84 -20.00 0 2 5.17 2.96 2.29 0 -4.00 4 -.18 .25 -.16 3.12 .98 8 -5.47 -1.36 -1.98 17.18 16.57 16 -1.90 -6.64 -3.54 -14.83 -12.12 24 -.01 23.63 14.65 57.61 46.64 32 .27 -3.19 -3.11 -22.98 -22.91 40 -1.06 -2.96 -2.96 -4.18 -4.10 48 -.28 -2.34 -3.71 -2.41 -3.81 64 9.71 33.77 30.65 81.44 77.09 80 -10.69 -31.07 -31.70 -29.22 -29.88 96 -1.14 5.98 .56 -11.57 -16.14 128 -.93 -15.60 -18.31 -19.89 -22.65 _____________________________________________________ BW: 0 CPU/RCPU: -4.2,-6.1 SD/RSD: -13.1,-15.6 _____________________________________________________ Each test case is for 60 secs, sum over two runs (except when number of netperf sessions is 1, which has 7 runs of 10 secs each), numcpus=4, numtxqs=8, etc. No tuning other than taskset each vhost to cpus 0-3. Thanks, - KK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html