Avi Kivity <avi@xxxxxxxxxx> wrote on 09/08/2010 01:17:34 PM: > On 09/08/2010 10:28 AM, Krishna Kumar wrote: > > Following patches implement Transmit mq in virtio-net. Also > > included is the user qemu changes. > > > > 1. This feature was first implemented with a single vhost. > > Testing showed 3-8% performance gain for upto 8 netperf > > sessions (and sometimes 16), but BW dropped with more > > sessions. However, implementing per-txq vhost improved > > BW significantly all the way to 128 sessions. > > Why were vhost kernel changes required? Can't you just instantiate more > vhost queues? I did try using a single thread processing packets from multiple vq's on host, but the BW dropped beyond a certain number of sessions. I don't have the code and performance numbers for that right now since it is a bit ancient, I can try to resuscitate that if you want. > > Guest interrupts for a 4 TXQ device after a 5 min test: > > # egrep "virtio0|CPU" /proc/interrupts > > CPU0 CPU1 CPU2 CPU3 > > 40: 0 0 0 0 PCI-MSI-edge virtio0-config > > 41: 126955 126912 126505 126940 PCI-MSI-edge virtio0-input > > 42: 108583 107787 107853 107716 PCI-MSI-edge virtio0-output.0 > > 43: 300278 297653 299378 300554 PCI-MSI-edge virtio0-output.1 > > 44: 372607 374884 371092 372011 PCI-MSI-edge virtio0-output.2 > > 45: 162042 162261 163623 162923 PCI-MSI-edge virtio0-output.3 > > How are vhost threads and host interrupts distributed? We need to move > vhost queue threads to be colocated with the related vcpu threads (if no > extra cores are available) or on the same socket (if extra cores are > available). Similarly, move device interrupts to the same core as the > vhost thread. All my testing was without any tuning, including binding netperf & netserver (irqbalance is also off). I assume (maybe wrongly) that the above might give better results? Are you suggesting this combination: IRQ on guest: 40: CPU0 41: CPU1 42: CPU2 43: CPU3 (all CPUs are on socket #0) vhost: thread #0: CPU0 thread #1: CPU1 thread #2: CPU2 thread #3: CPU3 qemu: thread #0: CPU4 thread #1: CPU5 thread #2: CPU6 thread #3: CPU7 (all CPUs are on socket#1) netperf/netserver: Run on CPUs 0-4 on both sides The reason I did not optimize anything from user space is because I felt showing the default works reasonably well is important. Thanks, - KK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html