The following patches apply over mst's vhost branch and were tested againt that branch and also mkp's 5.13 branch which has some vhost-scsi changes. These patches allow us to support multiple vhost workers per device. I ended up just doing Stefan's original idea where userspace has the kernel create a worker and we pass back the pid. This has the benefit over the workqueue and userspace thread approach where we only have one'ish code path in the kernel. The kernel patches here allow us to then do N workers device and also share workers across devices. I included a patch for qemu so you can get an idea of how it works. TODO: ----- - polling - Allow sharing workers across devices. Kernel support is added and I hacked up userspace to test, but I'm still working on a sane way to manage it in userspace. - Bind to specific CPUs. Commands like "virsh emulatorpin" work with these patches and allow us to set the group of vhost threads to different CPUs. But we can also set a specific vq's worker to run on a CPU. - I'm handling old kernel by just checking for EPERM. Does this require a feature? Results: -------- When running with the null_blk driver and vhost-scsi I can get 1.2 million IOPs by just running a simple fio --filename=/dev/sda --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --numjobs=8 --time_based --group_reporting --name=iops --runtime=60 --eta-newline=1 The VM has 8 vCPUs and sda has 8 virtqueues and we can do a total of 1024 cmds per devices. To get 1.2 million IOPs I did have to tune and ran the virsh emulatorpin command so the vhost threads were running on different CPUs than the VM. If the vhost threads share CPUs then I get around 800K. For a more real device that are also CPU hogs like iscsi, I can still get 1 million IOPs using 1 dm-multipath device over 8 iscsi paths (natively it gets 1.1 million IOPs). _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization