[Bug 60620] New: guest loses frequently (multiple times per day!) connectivity to network device

bugzilla-daemon@xxxxxxxxxxxxxxxxxxx · Wed, 24 Jul 2013 19:31:41 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=60620

            Bug ID: 60620
           Summary: guest loses frequently (multiple times per day!)
                    connectivity to network device
           Product: Virtualization
           Version: unspecified
    Kernel Version: 3.8, 3.9, 3.10 in both the host as well as the guest
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: kvm
          Assignee: virtualization_kvm@xxxxxxxxxxxxxxxxxxxx
          Reporter: folkert@xxxxxxxxxxxxxx
        Regression: No

Hi,

I have a server with plenty of free ram (16GB free when running the guests).
Each guest has 8GB of ram.
One guest has 3 network interfaces. They are connected to bridges in the host
(is it called "DOM0" in KVM as well?). That guest now frequently (multiple
times per day) loses all connectivity to that interface. If I then ping any
host connected to that interface, no ping comes back: only a message about
buffer space not being enough. As a test I increased wmem and friends but that
did not help at all. I also removed the bridge for that one problematic
interface and replaced by a direct connection: did not help.
Neither the host nor the guest log anything to dmesg or syslog or anything.
The only thing that helps is completely rebooting the guest. Ifdown in the
guest and/or host does not help.
This problem happens only with 1 guest and only with 1 interface. I verified
that the other adapters can use their networks just fine. When it is used in
bridging mode, the host is still able to use it; only the guest isn't.
When the guest is connected via a bridge and it fails, then the "dropped
packets" counter starts increasing for every packet send out. If it is
"directly" connected, this counter is not increased.
I did some googling and found the suggestion to modprobe the vhost_net module:
did not help. Using e1000 instead of virtio: did not help.
I verified that there were not too many sockets open: only +/- 800. Note that
this exact configuration ran fine for years on real hardware. Also the guest
frequently has plenty of free ram (not even used by cache/buffers as it also
happens after a couple of minutes up time); mostly 5GB.
I tried disabling STP on the bridge: did not help.
Apart of that increasing "dropped packets" counter, there's also an other
difference between direct-connection and connected via a bridge:

20:19:03.910892 ARP, Request who-has 192.168.178.2 tell 192.168.178.1, length
46
20:19:04.906854 ARP, Request who-has 192.168.178.2 tell 192.168.178.1, length
46
20:19:05.493445 ARP, Request who-has 192.168.178.2 tell 192.168.178.83, length
46
20:19:05.903027 ARP, Request who-has 192.168.178.2 tell 192.168.178.1, length
46
20:19:06.490750 ARP, Request who-has 192.168.178.2 tell 192.168.178.83, length
46
...

2 is the problematic guest and 83 and 1 are indeed devices in that network!
So arp comes in but no replies go out also no other traffic comes in: this is
the interface to the internet and normally it has a constant input of data
(e.g. NTP requests, VPN data (tinc), web-server requests, mail, etc.).

Versions used:

pxe-qemu        1.0.0+git-20120202.f6840ba-3
kvm     1:1.1.2+dfsg-6
qemu    1.1.2+dfsg-6a
qemu-keymaps    1.1.2+dfsg-6a
qemu-kvm        1.1.2+dfsg-6
qemu-system     1.1.2+dfsg-6a
qemu-user       1.1.2+dfsg-6a
qemu-utils      1.1.2+dfsg-6a

Kernels: 3.8, 3.9 and 3.10.
Both on the guest and the host.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html