Re: NIC Stability Problems Under Xen 4.4 / CentOS 6 / Linux 3.18

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Have you tried to eliminate all power management features all over?

Are the devices connected to the same network infrastructure?

There has to be something common.

I've been using Intel NICs with Xen/CentOS for ages with no issues.

Karel

On 27.1.2017 02:57, Kevin Stange wrote:
On 01/26/2017 02:08 PM, Kevin Stange wrote:
On 01/26/2017 09:35 AM, Johnny Hughes wrote:
On 01/26/2017 09:32 AM, Johnny Hughes wrote:
On 01/25/2017 11:49 AM, Kevin Stange wrote:
On 01/24/2017 11:16 AM, Kevin Stange wrote:
On 01/24/2017 09:10 AM, Konrad Rzeszutek Wilk wrote:
On Tue, Jan 24, 2017 at 09:29:39PM +0800, -=X.L.O.R.D=- wrote:
Kevin Stange,
It can be either kernel or update the NIC driver or firmware of the NIC
card. Hope that helps!

Xlord
-----Original Message-----
From: CentOS-virt [mailto:centos-virt-bounces@xxxxxxxxxx] On Behalf Of Kevin
Stange
Sent: Tuesday, January 24, 2017 1:04 AM
To: centos-virt@xxxxxxxxxx
Subject:  NIC Stability Problems Under Xen 4.4 / CentOS 6 /
Linux 3.18

<snip>

Has anyone experienced similar issues with this configuration, and if so,
does anyone have tips on how to resolve the issues?

Honeslty I would email Intel and see if they can help. This looks like
the NIC decides something is wrong, throws off an PCIe error and
then resets itself.

This happens for several different NICs.  Is there a good contact at
Intel for this kind of thing, or should I just try to reach them through
their web site?

It could also be an error in the Linux stack which would "eat" an
interrupt when migrating interrupts (which was fixed
upstream, see below). Are you running irqbalance? Could you try
turning it off?

irqbalance is enabled on these servers.  I'll try disabling it.

I had stopped irqbalance yesterday afternoon, but had a hypervisor's
NICs fail anyway in early morning this morning, so I'm pretty sure this
is not the right tree to bark up.


Here is a set of drivers/fireware from Intel for those NICs:

https://downloadcenter.intel.com/download/15817/Intel-Network-Adapter-Driver-for-PCI-E-Gigabit-Network-Connections-under-Linux-

I will see if I can get a CentOS-6 build of the latest version of that
from our older SRPM:

http://vault.centos.org/6.7/xen4/Source/SPackages/e1000e-2.5.4-3.10.68.2.el6.centos.alt.src.rpm

I am currently very busy with several c5, c6, c7 updates and the i686
altarch c7 tree .. but I have this on my list.  In the meantime, maybe
someone else could also see if those drivers help you (or you could try
to compile / install it).

Do you have another machine that you can use to see if you can duplicate
the issue NOT running the xen.gz hypervisor boot, but just the straight
kernel?

I can't actually reproduce this problem reliably.  It happens randomly
when the servers are up and running anywhere between a few hours and a
month or more, and I haven't been able to isolate any specific way to
cause it to happen.  As a result I can't really test different solutions
on different servers to see what helps.  I was hoping other people were
seeing it so that I could get some direction.  If I can reproduce it, it
won't take me very long to identify what the cause is.  Right now if I
do upgrade the drivers on the systems I won't really know if it's fixed
until I don't see another issue for several months.

Actually .. I think this is the driver for you:

https://downloadcenter.intel.com/download/13663

And this explains how to make it work:

http://www.intel.com/content/www/us/en/support/network-and-i-o/ethernet-products/000005767.html

The different combinations of NICs overlap both the e1000e and igb
drivers, but the most egregious issues have been with the igb ones.
I'll try to give this a shot and report back if I still see issues with
a server after doing so, but it might be a week or two before I find out.

So the NICs giving issues in most cases were igb drivers.  I've tried
replacing the drivers on some HVs with the version you suggested, but it
doesn't seem to have helped with stability.  Any other ideas?

_______________________________________________
CentOS-virt mailing list
CentOS-virt@xxxxxxxxxx
https://lists.centos.org/mailman/listinfo/centos-virt



[Index of Archives]     [CentOS Users]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [X.org]     [Xfree86]     [Linux USB]

  Powered by Linux