On 12/18/2012 09:42 PM, Michael S. Tsirkin wrote: > On Tue, Dec 18, 2012 at 01:32:47PM +0100, Paolo Bonzini wrote: >> Hi all, >> >> this series adds multiqueue support to the virtio-scsi driver, based >> on Jason Wang's work on virtio-net. It uses a simple queue steering >> algorithm that expects one queue per CPU. LUNs in the same target always >> use the same queue (so that commands are not reordered); queue switching >> occurs when the request being queued is the only one for the target. >> Also based on Jason's patches, the virtqueue affinity is set so that >> each CPU is associated to one virtqueue. >> >> I tested the patches with fio, using up to 32 virtio-scsi disks backed >> by tmpfs on the host. These numbers are with 1 LUN per target. >> >> FIO configuration >> ----------------- >> [global] >> rw=read >> bsrange=4k-64k >> ioengine=libaio >> direct=1 >> iodepth=4 >> loops=20 >> >> overall bandwidth (MB/s) >> ------------------------ >> >> # of targets single-queue multi-queue, 4 VCPUs multi-queue, 8 VCPUs >> 1 540 626 599 >> 2 795 965 925 >> 4 997 1376 1500 >> 8 1136 2130 2060 >> 16 1440 2269 2474 >> 24 1408 2179 2436 >> 32 1515 1978 2319 >> >> (These numbers for single-queue are with 4 VCPUs, but the impact of adding >> more VCPUs is very limited). >> >> avg bandwidth per LUN (MB/s) >> ---------------------------- >> >> # of targets single-queue multi-queue, 4 VCPUs multi-queue, 8 VCPUs >> 1 540 626 599 >> 2 397 482 462 >> 4 249 344 375 >> 8 142 266 257 >> 16 90 141 154 >> 24 58 90 101 >> 32 47 61 72 > > > Could you please try and measure host CPU utilization? I measured and didn't see any CPU utilization regression here. > Without this data it is possible that your host > is undersubscribed and you are drinking up more host CPU. > > Another thing to note is that ATM you might need to > test with idle=poll on host otherwise we have strange interaction > with power management where reducing the overhead > switches to lower power so gives you a worse IOPS. Yeah, I measured with host cpu idle=poll and saw that the performance improved about 68%. Thanks, Wanlong Gao > > >> Patch 1 adds a new API to add functions for piecewise addition for buffers, >> which enables various simplifications in virtio-scsi (patches 2-3) and a >> small performance improvement of 2-6%. Patches 4 and 5 add multiqueuing. >> >> I'm mostly looking for comments on the new API of patch 1 for inclusion >> into the 3.9 kernel. >> >> Thanks to Wao Ganlong for help rebasing and benchmarking these patches. >> >> Paolo Bonzini (5): >> virtio: add functions for piecewise addition of buffers >> virtio-scsi: use functions for piecewise composition of buffers >> virtio-scsi: redo allocation of target data >> virtio-scsi: pass struct virtio_scsi to virtqueue completion function >> virtio-scsi: introduce multiqueue support >> >> drivers/scsi/virtio_scsi.c | 374 +++++++++++++++++++++++++++++------------- >> drivers/virtio/virtio_ring.c | 205 ++++++++++++++++++++++++ >> include/linux/virtio.h | 21 +++ >> 3 files changed, 485 insertions(+), 115 deletions(-) > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html