On Thu, 2012-06-14 at 17:57 +0800, Cong Meng wrote: > On Wed, 2012-06-13 at 12:08 -0700, Nicholas A. Bellinger wrote: > > On Wed, 2012-06-13 at 18:13 +0800, mengcong wrote: > > > Hi folks, I did an IO performance test on the tcm-vhost scsi. I want to share > > > the test result data here. > > > > > > > > > seq-read seq-write rand-read rand-write > > > 8k 256k 8k 256k 8k 256k 8k 256k > > > ---------------------------------------------------------------------------- > > > bare-metal 67951 69802 67064 67075 1758 29284 1969 26360 > > > tcm-vhost-iblock 61501 66575 51775 67872 1011 22533 1851 28216 > > > tcm-vhost-pscsi 66479 68191 50873 67547 1008 22523 1818 28304 > > > virtio-blk 26284 66737 23373 65735 1724 28962 1805 27774 > > > scsi-disk 36013 60289 46222 62527 1663 12992 1804 27670 > > > > > > unit: KB/s > > > seq-read/write = sequential read/write > > > rand-read/write = random read/write > > > 8k,256k are blocksize of the IO > > > > > > In tcm-vhost-iblock test, the emulate_write_cache attr was enabled. > > > In virtio-blk test, cache=none,aio=native were set. > > > In scsi-disk test, cache=none,aio=native were set, and LSI HBA was used. > > > > > > I also tried to do the test with a scsi-generic LUN (pass through the > > > physical partition /dev/sgX device). But I couldn't setup it > > > successfully. It's a pity. > > > > > > Benchmark tool: fio, with ioengine=aio,direct=1,iodepth=8 set for all tests. > > > kvm vm: 2 cpus and 2G ram > > > > > > > These initial performance results look quite promising for virtio-scsi. > > > > I'd be really interested to see how a raw flash block device backend > > that locally can do ~100K 4k mixed R/W random IOPs compares with > > virtio-scsi guest performance as the random small block fio workload > > increases.. > flash block == Solid state disk? I have no one on hand. > > It just so happens there is a FusionIO HBA nearby.. ;) However, I'm quite busy with customer items the next days but am really looking forward to giving this a shot with some fast raw block flash backends soon.. Also, it's about time to convert tcm_vhost from using the old TFO->new_cmd_map() to native cmwq.. This will certainly help overall tcm_vhost performance, especially with shared backends across multiple VMs where parts of I/O backend execution can happen in separate kworker process context. Btw, I'll likely end up doing this conversion to realize the performance benefits when testing with raw flash backends, but I'm more than happy to take a patch ahead of that if you have the extra cycles to spare. > > Also note there is a bottleneck wrt to random small block I/O > > performance (per LUN) on the Linux/SCSI initiator side that is effecting > > things here. We've run into this limitation numerous times with using > > SCSI LLDs as backend TCM devices, and I usually recommend using iblock > > export with raw block flash backends for achieving the best small block > > random I/O performance results. A number of high performance flash > > storage folks do something similar with raw block access (Jen's CC'ed) > > > > As per Stefan's earlier question, how does virtio-scsi to QEMU SCSI > > userspace compare with these results..? Is there a reason why these > > where not included in the initial results..? > > > This should be a mistake I made. I will do this pattern later. > Thanks! --nab -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html