RE: Network throughput limits for local VM <-> VM communication

"Fischer, Anna" <anna.fischer@xxxxxx> · Wed, 17 Jun 2009 08:12:18 +0000

> Subject: Re: Network throughput limits for local VM <-> VM
> communication
> 
> On 06/17/2009 10:36 AM, Fischer, Anna wrote:
> >
> > /usr/bin/qemu-system-x86_64 -m 1024 -smp 2 -name FC10-2 -uuid
> b811b278-fae2-a3cc-d51d-8f5b078b2477 -boot c -drive
> file=,if=ide,media=cdrom,index=2 -drive
> file=/var/lib/libvirt/images/FC10-2.img,if=virtio,index=0,boot=on -net
> nic,macaddr=54:52:00:11:ae:79,model=e1000 -net tap net
> nic,macaddr=54:52:00:11:ae:78,model=e1000 -net tap  -serial pty -
> parallel none -usb -vnc 127.0.0.1:2 -k en-gb -soundhw es1370
> >
> >
> 
> Okay, like I suspected, qemu has a trap here and you walked into it.
> The -net option plugs the device you specify into a virtual hub.  The
> command line you provided plugs the two virtual NICs and the two tap
> devices into one virtual hub, so any packet received from any of the
> four clients will be propagated to the other three.
> 
> To get this to work right, specify the vlan= parameter which says which
> virtual hub a component is plugged into.  Note this has nothing to do
> with 802.blah vlans.
> 
> So your command line should look like
> 
>     qemu ... -net nic,...,vlan=0 -net tap,...,vlan=0 -net
> nic,...,vlan=1
> -net tap,...,vlan=1
> 
> This will give you two virtual hubs, each bridging a virtual nic to a
> tap device.
> 
> > This is my "routing VM" that has two network interfaces and routes
> packets between two subnets. It has one interface plugged into bridge
> virbr0 and the other interface is plugged into virbr1:
> >
> > brctl show
> > bridge name     bridge id               STP enabled     interfaces
> > virbr0          8000.8ac1d18c63ec       no              vnet0
> >                                                          vnet1
> > virbr1          8000.2ebfcbb9ed70       no              vnet2
> >                                                          vnet3
> >
> 
> Please redo the tests with qemu vlans but without 802.blah vlans, so we
> see what happens without packet duplication.

Avi, thanks for your quick reply. I do use the vlan= parameter now, and yes, I do not see packet duplication any more, so everything you said is right and I do understand now why I was seeing packets on both bridges before. So this has nothing to do with tun/tap then but just with the way QEMU "virtual hubs" work. I didn't know about any details on that before.

Even with vlan= enabled, I am still having the same issues with weird CPU utilization and low throughput that I have described below.

> > If I use the e1000 virtual NIC model, I see performance drop
> significantly compared to using virtio_net. However, with virtio_net I
> have the network stalling after a few seconds of high-throughput
> traffic (as I mentioned in my previous post). Just to reiterate my
> scenario: I run three guests on the same physical machine, one guest is
> my routing VM that is routing IP network traffic between the other two
> guests.
> >
> > I am also wondering about the fact that I do not seem to get CPU
> utilization maxed out in this case while throughput does not go any
> higher. I do not understand what is stopping KVM from using more CPU
> for guest I/O processing? There is nothing else running on my machine.
> I have analyzed the amount of CPU that each KVM thread is using, and I
> can see that the thread that is running the VCPU of the routing VM
> which is processing interrupts of the e1000 virtual network card is
> using the highest amount of CPU. Is there any way that I can optimize
> my network set-up? Maybe some specific configuration of the e1000
> driver within the guest? Are there any known issues with this?
> >
> 
> There are known issues with lack of flow control while sending packets
> out of a guest.  If the guest runs tcp that tends to correct for it,
> but
> if you run a lower level protocol that doesn't have its own flow
> control, the guest may spend a lot of cpu generating packets that are
> eventually dropped.  We are working on fixing this.

For the tests I run now (with vlan= enabled) I am actually using both TCP and UDP, and I see the problem with virtio_net for both protocols. What I am wondering about though is that I do not seem to have any problems if I communicate directly between the two guests (if I plug then into the same bridge and put them onto the same networks), so why do I only see the problem of stalling network communication when there is a routing VM in the network path? Is this just because the system is even more overloaded in that case? Or could this be an issue related to a dual NIC configuration or the fact that I run multiple bridges on the same physical machine?

When you say "We are working on fixing this." - which code parts are you working on? Is this in the QEMU network I/O processing code or is this virtio_net related?

> > I also see very difference CPU utilization and network throughput
> figures when pinning threads to CPU cores using taskset. At one point I
> managed to double the throughput, but I could not reproduce that setup
> for some reason. What are the major issues that I would need to pay
> attention to when pinning threads to cores in order to optimize my
> specific set-up so that I can achieve better network I/O performance?
> >
> 
> It's black magic, unfortunately.  But please retry with the fixed
> configuration and we'll continue from there.

Retry with "the fixed configuration"? You mean setting the vlan= parameter? I have already used the vlan= parameter for the latest tests, and so the CPU utilization issues I am talking about are happening with that configuration.

Thanks,
Anna
��.n��������+%������w��{.n�����o�^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�m