I am trying to get max possible IOPS from a KVM virtual machine using fio. However I am getting bottlenecked by a single core. I attached an NVMe SSD using 'virsh attach-disk /dev/nvme0n1p1 vdb'. This enables writethrough cache and I get about 200k IOPS with 8 jobs and QD=32. I believe this uses virtio-blk driver inside guest. At this point I notice that the qemu-kvm thread becomes the bottleneck (I think it's the main loop/global mutex described in https://vmsplice.net/~stefan/stefanha-kvm-forum-2014.pdf). If I run 2 VMs (each writing to a partition on the SSD) I get about 190k IOPS in each VM - so it should be possible to achieve higher IOPS. I next tried using iothreads, but I still get about the same performance using 1 VM. I increased fio jobs to 16 but did not see any increase in performance. This is my libvirt xml: <disk type='block' device='disk'> <driver name='qemu' type='raw' io='threads' iothread='1'/> <source dev='/dev/nvme0n1p1'/> <target dev='vdb' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </disk> Now I want to try using multiple iothreads - but not sure how to enable it. I think the <disk><driver> element allows only 1 iothread id. Can I specify 1 iothread per vcpu all accessing the same drive? I couldn't figure the right libvirt syntax for this? Should I ask the libvirt mailing list? Should I use io=native? Thanks Prasun