Re: Network shutdown under load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

we're currently having this problem on two production servers
that 2-4 times a day one interface shuts down. We've four KVMs
running on two hosts (2x2). All VMs have eth0 and eth1 running virtio_net.
All eth0's are connected to bridge br0 and all eth1's are connected to
br1 on the host. Here are the startup options for one VM (the
others are quite similar [of course other mac address, ...]):

/usr/bin/kvm -m 8192 -smp 8 -cpu host -daemonize -k de -vnc 127.0.0.1:1
-monitor telnet:172.18.105.46:4444,server,nowait -localtime -pidfile
/tmp/kvm-dodoma.pid -drive
file=/data/kvm/kvmimages/dodoma.qcow2,if=virtio,cache=none,boot=on
-drive file=/data/kvm/kvmimages/dodoma-vdb.qcow2,if=virtio,cache=none
-net nic,vlan=104,model=virtio,macaddr=00:ff:48:e5:4b:8d -net
tap,vlan=104,ifname=tap.b.dodoma,script=no -net
nic,vlan=96,model=virtio,macaddr=00:ff:48:e5:4b:8f -net
tap,vlan=96,ifname=tap.f.dodoma,script=no

I've tried the very latest Gentoo kernel 2.6.30 on the host and
guest (all VMs and hosts running Gentoo btw.). With kernel
2.6.31 on host and 2.6.30 on guest the problem still exist. I've
tried KVM 0.11.1, 0.12.1.2 and 0.12.2 running with kernel 2.6.30
and 2.6.31 on the host side.

Interestingly all the VMs almost have the same network traffic
(in and out) but the VMs running Apache bind to eth1 have
the biggest problems. They shut down eth1 2-4 times a day.
eth0 is running fine despite that it is doing almost the same
traffic amount but this traffic comes from the database where
as eth1 sends the traffic to the proxy (Varnish). So incoming traffic
seems to work fine here but outgoing traffic is problematic. On the
other hand the VMs running Varnish getting all the traffic through
eth1. Here I've "only" seen one shutdown of eth1 in 48 hours.

Is there anything I can help to debug this problem? Is there
already a fix available? Otherwise I really have to install KVM-88
which runs fine on some other hosts.

Thanks!
Robert


Tom Lendacky wrote:
> There's been some discussion of this already in the kvm list, but I want to 
> summarize what I've found and also include the qemu-devel list in an effort to 
> find a solution to this problem.
>
> Running a netperf test between two kvm guests results in the guest's network 
> interface shutting down. I originally found this using kvm guests on two 
> different machines that were connected via a 10GbE link.  However, I found 
> this problem can be easily reproduced using two guests on the same machine.
>
> I am running the 2.6.32 level of the kvm.git tree and the 0.12.1.2 level of 
> the qemu-kvm.git tree.
>
> The setup includes two bridges, br0 and br1.
>
> The commands used to start the guests are as follows:
> usr/local/bin/qemu-system-x86_64 -name cape-vm001 -m 1024 -drive 
> file=/autobench/var/tmp/cape-vm001-
> raw.img,if=virtio,index=0,media=disk,boot=on -net 
> nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:51,netdev=cape-vm001-eth0 -
> netdev tap,id=cape-vm001-eth0,script=/autobench/var/tmp/ifup-kvm-
> br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net 
> nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:D1,netdev=cape-vm001-eth1 -
> netdev tap,id=cape-vm001-eth1,script=/autobench/var/tmp/ifup-kvm-
> br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :1 -monitor 
> telnet::5701,server,nowait -snapshot -daemonize
>
> usr/local/bin/qemu-system-x86_64 -name cape-vm002 -m 1024 -drive 
> file=/autobench/var/tmp/cape-vm002-
> raw.img,if=virtio,index=0,media=disk,boot=on -net 
> nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:61,netdev=cape-vm002-eth0 -
> netdev tap,id=cape-vm002-eth0,script=/autobench/var/tmp/ifup-kvm-
> br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net 
> nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:E1,netdev=cape-vm002-eth1 -
> netdev tap,id=cape-vm002-eth1,script=/autobench/var/tmp/ifup-kvm-
> br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :2 -monitor 
> telnet::5702,server,nowait -snapshot -daemonize
>
> The ifup-kvm-br0 script takes the (first) qemu created tap device and brings 
> it up and adds it to bridge br0.  The ifup-kvm-br1 script take the (second) 
> qemu created tap device and brings it up and adds it to bridge br1.
>
> Each ethernet device within a guest is on it's own subnet.  For example:
>   guest 1 eth0 has addr 192.168.100.32 and eth1 has addr 192.168.101.32
>   guest 2 eth0 has addr 192.168.100.64 and eth1 has addr 192.168.101.64
>
> On one of the guests run netserver:
>   netserver -L 192.168.101.32 -p 12000
>
> On the other guest run netperf:
>   netperf -L 192.168.101.64 -H 192.168.101.32 -p 12000 -t TCP_STREAM -l 60 -c 
> -C -- -m 16K -M 16K
>
> It may take more than one netperf run (I find that my second run almost always 
> causes the shutdown) but the network on the eth1 links will stop working.
>
> I did some debugging and found that in qemu on the guest running netserver:
>  - the receive_disabled variable is set and never gets reset
>  - the read_poll event handler for the eth1 tap device is disabled and never 
> re-enabled
> These conditions result in no packets being read from the tap device and sent 
> to the guest - effectively shutting down the network.  Network connectivity 
> can be restored by shutting down the guest interfaces, unloading the 
> virtio_net module, re-loading the virtio_net module and re-starting the guest 
> interfaces.
>
> I'm continuing to work on debugging this, but would appreciate if some folks 
> with more qemu network experience could try to recreate and debug this.
>
> If my kernel config matters, I can provide that.
>
> Thanks,
> Tom
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux