Re: [RFC PATCH 0/4] Implement multiqueue virtio-net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Avi Kivity <avi@xxxxxxxxxx> wrote on 09/08/2010 01:17:34 PM:

>   On 09/08/2010 10:28 AM, Krishna Kumar wrote:
> > Following patches implement Transmit mq in virtio-net.  Also
> > included is the user qemu changes.
> >
> > 1. This feature was first implemented with a single vhost.
> >     Testing showed 3-8% performance gain for upto 8 netperf
> >     sessions (and sometimes 16), but BW dropped with more
> >     sessions.  However, implementing per-txq vhost improved
> >     BW significantly all the way to 128 sessions.
>
> Why were vhost kernel changes required?  Can't you just instantiate more
> vhost queues?

I did try using a single thread processing packets from multiple
vq's on host, but the BW dropped beyond a certain number of
sessions. I don't have the code and performance numbers for that
right now since it is a bit ancient, I can try to resuscitate
that if you want.

> > Guest interrupts for a 4 TXQ device after a 5 min test:
> > # egrep "virtio0|CPU" /proc/interrupts
> >        CPU0     CPU1     CPU2    CPU3
> > 40:   0        0        0       0        PCI-MSI-edge  virtio0-config
> > 41:   126955   126912   126505  126940   PCI-MSI-edge  virtio0-input
> > 42:   108583   107787   107853  107716   PCI-MSI-edge  virtio0-output.0
> > 43:   300278   297653   299378  300554   PCI-MSI-edge  virtio0-output.1
> > 44:   372607   374884   371092  372011   PCI-MSI-edge  virtio0-output.2
> > 45:   162042   162261   163623  162923   PCI-MSI-edge  virtio0-output.3
>
> How are vhost threads and host interrupts distributed?  We need to move
> vhost queue threads to be colocated with the related vcpu threads (if no
> extra cores are available) or on the same socket (if extra cores are
> available).  Similarly, move device interrupts to the same core as the
> vhost thread.

All my testing was without any tuning, including binding netperf &
netserver (irqbalance is also off). I assume (maybe wrongly) that
the above might give better results? Are you suggesting this
combination:
	IRQ on guest:
		40: CPU0
		41: CPU1
		42: CPU2
		43: CPU3 (all CPUs are on socket #0)
	vhost:
		thread #0:  CPU0
		thread #1:  CPU1
		thread #2:  CPU2
		thread #3:  CPU3
	qemu:
		thread #0:  CPU4
		thread #1:  CPU5
		thread #2:  CPU6
		thread #3:  CPU7 (all CPUs are on socket#1)
	netperf/netserver:
		Run on CPUs 0-4 on both sides

The reason I did not optimize anything from user space is because
I felt showing the default works reasonably well is important.

Thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux