On Thu, Dec 22, 2016 at 01:34:47AM -0500, Weiwei Jia wrote: > With QEMU x-data-plane, I find the performance has not been improved > very much. Please see following two settings. Using IOThreads improves scalability for SMP guests with many disks. It does not improve performance for a single disk benchmark because there is nothing to spread across IOThreads. > Setting 1: I/O thread in host OS (VMM) reads 4KB each time from disk > (8GB in total). Pin the I/O thread to pCPU 5 which will serve I/O > thread dedicatedly. I find the performance is around 250 MB/s. 250 MB/s / 4 KB = 64k IOPS This seems like a reasonable result for single thread with a single disk. I guess the benchmark queue depth setting is larger than 1 though because it would only allow 15 microseconds per request. > Setting 2: I/O thread in guest OS (VMM) reads 4KB each time from > virtual disk (8GB in total). Pin the I/O thread to vCPU 5 and pin vCPU > 5 thread to pCPU5 so that vCPU 5 handles this I/O thread dedicatedly > and pCPU5 serve vCPU5 dedicatedly. In order to keep vCPU5 not to be > idle, I also pin one cpu intensive thread (while (1) {i++}) on vCPU 5 > so that the I/O thread on it can be served without delay. For this > setting, I find the performance for this I/O thread is around 190 > MB/s. 190 MB/s / 4 KB = 48k IOPS I worry that your while (1) {i++} thread may prevent achieving the best performance if the guest kernel scheduler allows it to use its time slice. Two options that might work better are: 1. idle=poll guest kernel command-line parameter 2. kvm.ko's halt_poll_ns host kernel module parameter > NOTE: For setting 2, I also pin the QEMU dedicated IOthread > (x-data-plane) in host OS to pCPU to handle I/O requests from guest OS > dedicatedly. Which pCPU did you pin the dataplane thread to? Did you try changing this? > I think for setting 2, the performance of I/O thread should be almost > the same as setting 1. I cannot understand why it is 60 MB/s lower > than setting 1. I am wondering whether there are something wrong with > my x-data-plane setting or virtio setting for VM. Would you please > give me some hints? Thank you. Ideally QEMU should achieve the same performance as bare metal. In practice the overhead increases as IOPS increases. You may be able to achieve 260 MB/s inside the guest with a larger request size since it involves fewer I/O requests. The expensive part is the virtqueue kick. Recently we tried polling the virtqueue instead of waiting for the ioeventfd file descriptor and got double-digit performance improvements: https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00148.html If you want to understand the performance of your benchmark you'll have compare host/guest disk stats (e.g. request lifetime, disk utilization, queue depth, average request size) to check that the bare metal and guest workloads are really sending comparable I/O patterns to the physical disk. Then you using Linux and/or QEMU tracing to analyze the request latency by looking at interesting points in the request lifecycle like virtqueue kick, host Linux AIO io_submit(2), etc. > Libvirt version: 2.4.0 > QEMU version: 2.3.0 > > The libvirt xml configuration file is like following (I only start > one VM with following xml config). > > <domain type='kvm' id='1' > xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> > <name>vm</name> > <uuid>3290a8d0-9d9f-b2c4-dd46-5d0d8a730cd6</uuid> > <memory unit='KiB'>8290304</memory> > <currentMemory unit='KiB'>8290304</currentMemory> > <vcpu placement='static'>15</vcpu> > <cputune> > <vcpupin vcpu='0' cpuset='0'/> > <vcpupin vcpu='1' cpuset='1'/> > <vcpupin vcpu='2' cpuset='2'/> > <vcpupin vcpu='3' cpuset='3'/> > <vcpupin vcpu='4' cpuset='4'/> > <vcpupin vcpu='5' cpuset='5'/> > <vcpupin vcpu='6' cpuset='6'/> > <vcpupin vcpu='7' cpuset='7'/> > <vcpupin vcpu='8' cpuset='8'/> > <vcpupin vcpu='9' cpuset='9'/> > <vcpupin vcpu='10' cpuset='10'/> > <vcpupin vcpu='11' cpuset='11'/> > <vcpupin vcpu='12' cpuset='12'/> > <vcpupin vcpu='13' cpuset='13'/> > <vcpupin vcpu='14' cpuset='14'/> > </cputune> > <resource> > <partition>/machine</partition> > </resource> > <os> > <type arch='x86_64' machine='pc-i440fx-2.2'>hvm</type> > <boot dev='hd'/> > </os> > <features> > <acpi/> > <apic/> > <pae/> > </features> > <clock offset='utc'/> > <on_poweroff>destroy</on_poweroff> > <on_reboot>restart</on_reboot> > <on_crash>restart</on_crash> > <devices> > <emulator>/usr/bin/kvm-spice</emulator> > <disk type='file' device='disk'> > <driver name='qemu' type='raw' cache='none' io='native'/> > <source file='/var/lib/libvirt/images/vm.img'/> > <target dev='vda' bus='virtio'/> > <alias name='virtio-disk0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x06' > function='0x0'/> > </disk> > <disk type='block' device='cdrom'> > <driver name='qemu' type='raw'/> > <target dev='hdc' bus='ide'/> > <readonly/> > <alias name='ide0-1-0'/> > <address type='drive' controller='0' bus='1' target='0' unit='0'/> > </disk> > <controller type='usb' index='0'> > <alias name='usb0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x01' > function='0x2'/> > </controller> > <controller type='pci' index='0' model='pci-root'> > <alias name='pci.0'/> > </controller> > <controller type='ide' index='0'> > <alias name='ide0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x01' > function='0x1'/> > </controller> > <controller type='scsi' index='0'> > <alias name='scsi0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x04' > function='0x0'/> > </controller> > <interface type='network'> > <mac address='52:54:00:8e:3d:06'/> > <source network='default'/> > <target dev='vnet0'/> > <model type='virtio'/> > <alias name='net0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x03' > function='0x0'/> > </interface> > <serial type='pty'> > <source path='/dev/pts/8'/> > <target port='0'/> > <alias name='serial0'/> > </serial> > <console type='pty' tty='/dev/pts/8'> > <source path='/dev/pts/8'/> > <target type='serial' port='0'/> > <alias name='serial0'/> > </console> > <input type='mouse' bus='ps2'/> > <input type='keyboard' bus='ps2'/> > <graphics type='vnc' port='5900' autoport='yes' listen='127.0.0.1'> > <listen type='address' address='127.0.0.1'/> > </graphics> > <video> > <model type='cirrus' vram='9216' heads='1'/> > <alias name='video0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x02' > function='0x0'/> > </video> > <memballoon model='virtio'> > <alias name='balloon0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x05' > function='0x0'/> > </memballoon> > </devices> > <seclabel type='none'/> > <qemu:commandline> > <qemu:arg value='-set'/> > <qemu:arg value='device.virtio-disk0.scsi=off'/> > <qemu:arg value='-set'/> > <qemu:arg value='device.virtio-disk0.config-wce=off'/> > <qemu:arg value='-set'/> > <qemu:arg value='device.virtio-disk0.x-data-plane=on'/> > </qemu:commandline> > </domain> > > > Thank you, > Weiwei Jia
Attachment:
signature.asc
Description: PGP signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list