Re: network shutdown under heavy load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tom Lendacky a écrit :
On Wednesday 20 January 2010 09:48:04 am Tom Lendacky wrote:
On Tuesday 19 January 2010 05:57:53 pm Chris Wright wrote:
* Tom Lendacky (tahm@xxxxxxxxxxxxxxxxxx) wrote:
On Wednesday 13 January 2010 03:52:28 pm Chris Wright wrote:
(Mark cc'd, sound familiar?)

* Tom Lendacky (tahm@xxxxxxxxxxxxxxxxxx) wrote:
On Sunday 10 January 2010 06:38:54 am Avi Kivity wrote:
On 01/10/2010 02:35 PM, Herbert Xu wrote:
On Sun, Jan 10, 2010 at 02:30:12PM +0200, Avi Kivity wrote:
This isn't in 2.6.27.y.  Herbert, can you send it there?
It appears that now that TX is fixed we have a similar problem
with RX.  Once I figure that one out I'll send them together.
I've been experiencing the network shutdown issue also.  I've been
running netperf tests across 10GbE adapters with Qemu 0.12.1.2,
RHEL5.4 guests and 2.6.32 kernel (from kvm.git) guests.  I
instrumented Qemu to print out some network statistics.  It appears
that at some point in the netperf test the receiving guest ends up
having the 10GbE device "receive_disabled" variable in its
VLANClientState structure stuck at 1. From looking at the code it
appears that the virtio-net driver in the guest should cause
qemu_flush_queued_packets in net.c to eventually run and clear the
"receive_disabled" variable but it's not happening.  I don't seem
to have these issues when I have a lot of debug settings active in
the guest kernel which results in very low/poor network performance
- maybe some kind of race condition?
Ok, here's an update. After realizing that none of the ethtool offload
options were enabled in my guest, I found that I needed to be using the
-netdev option on the qemu command line.  Once I did that, some ethtool
offload options were enabled and the deadlock did not appear when I did
networking between guests on different machines.  However, the deadlock
did appear when I did networking between guests on the same machine.
What does your full command line look like?  And when the networking
stops does your same receive_disabled hack make things work?
The command line when using the -net option for the tap device is:

/usr/local/bin/qemu-system-x86_64 -name cape-vm001 -m 1024 -drive
file=/autobench/var/tmp/cape-vm001-
raw.img,if=virtio,index=0,media=disk,boot=on -net
nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:51 -net
tap,vlan=0,script=/autobench/var/tmp/ifup-kvm-
br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net
nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:D1 -net
tap,vlan=1,script=/autobench/var/tmp/ifup-kvm-
br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :1 -monitor
telnet::5701,server,nowait -snapshot -daemonize

when using the -netdev option for the tap device:

/usr/local/bin/qemu-system-x86_64 -name cape-vm001 -m 1024 -drive
file=/autobench/var/tmp/cape-vm001-
raw.img,if=virtio,index=0,media=disk,boot=on -net
nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:51,netdev=cape-vm001-eth0 -
netdev tap,id=cape-vm001-eth0,script=/autobench/var/tmp/ifup-kvm-
br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net
nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:D1,netdev=cape-vm001-eth1 -
netdev tap,id=cape-vm001-eth1,script=/autobench/var/tmp/ifup-kvm-
br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :1 -monitor
telnet::5701,server,nowait -snapshot -daemonize


The first ethernet device is a 1GbE device for communicating with the
automation infrastructure we have.  The second ethernet device is the 10GbE
device that the netperf tests run on.

I can get the networking to work again by bringing down the interfaces and
reloading the virtio_net module (modprobe -r virtio_net / modprobe
virtio_net).

I haven't had a chance yet to run the tests against a modified version of
 qemu that does not set the receive_disabled variable.

I got a chance to run with the setting of the receive_diabled variable commented out and I still run into the problem. It's easier to reproduce when running netperf between two guests on the same machine. I instrumented qemu and virtio a little bit to try and track this down. What I'm seeing is that, with two guests on the same machine, the receiving (netserver) guest eventually gets into a condition where the tap read poll callback is disabled and never re-enabled. So packets are never delivered from tap to qemu and to the guest. On the sending (netperf) side the transmit queue eventually runs out of capacity and it can no longer send packets (I believe this is unique to having the guests on the same machine). And as before, bringing down the interfaces, reloading the virtio_net module, and restarting the interfaces clears things up.

Tom

Tom

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hi,

it seems, that i encounter the same bug.

I've a guest with an high network load, and after some time, it seems that there's "no more network".
Under the guest, I can't ping anymore the gateway.
If i  restart the guest, everything work fine again.

My environment:
Debian/Squeeze

host2-kvm:~# uname -a
Linux host2-kvm 2.6.33-rc6-git4-jp #3 SMP Thu Feb 4 17:13:38 CET 2010 x86_64 GNU/Linux

It's a 2.6.33 kernel with theses two patch from Patrick McHardy (from Netfilter):
http://patchwork.kernel.org/patch/76980/
http://patchwork.kernel.org/patch/76980/

host2-kvm:~# virsh version
Compiled against library: libvir 0.7.6
Using library: libvir 0.7.6
Using API: QEMU 0.7.6
Running hypervisor: QEMU 0.12.2

Under Debian/Lenny, with a 2.6.26 kernel, i don't encounter this bug?

Can someone tell me if there is any option to active in the kernel for debug this?

Many thanks.

Regards.

begin:vcard
fn:Jean-Philippe Menil
n:Menil;Jean-Philippe
org;quoted-printable:Universit=C3=A9 de Nantes;IRTS - DSI
adr;quoted-printable:BP 92208 Cedex 3;;2, rue de la Houssini=C3=A8re;Nantes;Loire-Atlantique;44322;France
email;internet:jean-philippe.menil@xxxxxxxxxxxxxx
title;quoted-printable:Administrateur R=C3=A9seau
tel;work:02.51.12.53.92
tel;fax:02.51.12.58.60
x-mozilla-html:FALSE
url:http://www.cri.univ-nantes.fr
version:2.1
end:vcard


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux