Hi Stefan, see below. On 13.02.2012 11:57, Stefan Hajnoczi wrote: > On Fri, Feb 10, 2012 at 2:36 PM, Dongsu Park > <dongsu.park@xxxxxxxxxxxxxxxx> wrote: > > Now I'm running benchmarks with both qemu-kvm 0.14.1 and 1.0. > > > > - Sequential read (Running inside guest) > > # fio -name iops -rw=read -size=1G -iodepth 1 \ > > -filename /dev/vdb -ioengine libaio -direct=1 -bs=4096 > > > > - Sequential write (Running inside guest) > > # fio -name iops -rw=write -size=1G -iodepth 1 \ > > -filename /dev/vdb -ioengine libaio -direct=1 -bs=4096 > > > > For each one, I tested 3 times to get the average. > > > > Result: > > > > seqread with qemu-kvm 0.14.1 67,0 MByte/s > > seqread with qemu-kvm 1.0 30,9 MByte/s > > > > seqwrite with qemu-kvm 0.14.1 65,8 MByte/s > > seqwrite with qemu-kvm 1.0 30,5 MByte/s > > Please retry with the following commit or simply qemu-kvm.git/master. > Avi discovered a performance regression which was introduced when the > block layer was converted to use coroutines: > > $ git describe 39a7a362e16bb27e98738d63f24d1ab5811e26a8 > v1.0-327-g39a7a36 > > (This commit is not in 1.0!) > > Please post your qemu-kvm command-line. > > 67 MB/s sequential 4 KB read means 67 * 1024 / 4 = 17152 requests per > second, so 58 microseconds per request. > > Please post the fio output so we can double-check what is reported. As you mentioned above, I tested it again with the revision v1.0-327-g39a7a36, which includes the commit 39a7a36. Result is though still not good enough. seqread : 20.3 MByte/s seqwrite : 20.1 MByte/s randread : 20.5 MByte/s randwrite : 20.0 MByte/s My qemu-kvm commandline is like below: ======================================================================= /usr/bin/kvm -S -M pc-0.14 -enable-kvm -m 1024 \ -smp 1,sockets=1,cores=1,threads=1 -name mydebian3_8gb \ -uuid d99ad012-2fcc-6f7e-fbb9-bc48b424a258 -nodefconfig -nodefaults \ -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/mydebian3_8gb.monitor,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown \ -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw \ -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 \ -drive file=/var/lib/libvirt/images/mydebian3_8gb.img,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native \ -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \ -drive file=/dev/ram0,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native \ -device virtio-blk-pci,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1 \ -netdev tap,fd=19,id=hostnet0 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:68:9f:d0,bus=pci.0,addr=0x3 \ -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 \ -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga cirrus \ -device AC97,id=sound0,bus=pci.0,addr=0x4 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 ======================================================================= As you see, /dev/ram0 is being mapped to /dev/vdb on the guest side, which is used for fio tests. Here is a sample of fio output: ======================================================================= # fio -name iops -rw=read -size=1G -iodepth 1 -filename /dev/vdb \ -ioengine libaio -direct=1 -bs=4096 iops: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=1 Starting 1 process Jobs: 1 (f=1): [R] [100.0% done] [21056K/0K /s] [5140/0 iops] [eta 00m:00s] iops: (groupid=0, jobs=1): err= 0: pid=1588 read : io=1024MB, bw=20101KB/s, iops=5025, runt= 52166msec slat (usec): min=4, max=6461, avg=24.00, stdev=19.75 clat (usec): min=0, max=11934, avg=169.49, stdev=113.91 bw (KB/s) : min=18200, max=23048, per=100.03%, avg=20106.31, stdev=934.42 cpu : usr=5.43%, sys=23.25%, ctx=262363, majf=0, minf=28 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued r/w: total=262144/0, short=0/0 lat (usec): 2=0.01%, 4=0.16%, 10=0.03%, 20=0.01%, 50=0.27% lat (usec): 100=4.07%, 250=89.12%, 500=5.76%, 750=0.30%, 1000=0.13% lat (msec): 2=0.12%, 4=0.02%, 10=0.01%, 20=0.01% Run status group 0 (all jobs): READ: io=1024MB, aggrb=20100KB/s, minb=20583KB/s, maxb=20583KB/s, mint=52166msec, maxt=52166msec Disk stats (read/write): vdb: ios=261308/0, merge=0/0, ticks=40210/0, in_queue=40110, util=77.14% ======================================================================= So I think, the patch for coroutine-ucontext isn't about the bottleneck I'm looking for. Regards, Dongsu p.s. Sorry for the late reply. Last week I was on vacation. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html