Re: NIC Stability Problems Under Xen 4.4 / CentOS 6 / Linux 3.18

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/02/17 06:29, Kevin Stange wrote:
On 01/30/2017 06:41 PM, Kevin Stange wrote:
On 01/30/2017 06:12 PM, Adi Pircalabu wrote:
On 31/01/17 10:49, Kevin Stange wrote:
You said 3.x kernels specifically. The kernel on Xen Made Easy now is a
4.4 kernel.  Any chance you have tested with that one?

Not yet, however the future Xen nodes we'll deploy will run CentOS 7 and
Xen with kernel 4.4.

I'll keep you (and others here) posted on my own experiences with that
4.4 build over the next few weeks to report on any issues.  I'm hoping
something happened between 3.18 and 4.4 that fixed underlying problems.

Did you ever try without MTU=9000 (default 1500 instead)?

Yes, also with all sorts of configuration combinations like LACP rate
slow/fast, "options ixgbe LRO=0,0" and so on. No improvement.

Alright, I'll assume that probably won't help then.  I tried it on one
box which hasn't had the issue again yet, but that doesn't guarantee
anything.

I was able to discover something new, which might not conclusively prove
anything, but it at least seems to rule out the pci=nomsi kernel option
from being effective.

I had one server booted with that option as well as MTU 1500.  It was
stable for quite a long time, so I decided to try turning the MTU back
to 9000 and within 12 hours, the interface on the expansion NIC with the
jumbo MTU failed.

The other NIC in the LACP bundle is onboard and didn't fail.  The other
NIC on the dual-port expansion card also didn't fail.  This leads me to
believe that ONE of the bugs I'm experiencing is related to 82575EB +
jumbo frames.

I still think I'm also having a PCI-e issue that is separate and
additional on top of that, and which has not reared its head recently,
making it difficult for me to gather any new data.

One of the things I've done that seemed to help a lot with stability was
balance the LACP so that one NIC from onboard and one NIC from expansion
card is in each LAG.  Previously we just had the first LAG onboard and
the second on the expansion card.  This way, at least, given the
expansion NIC's propensity toward failing first, I don't have to crash
the server and all running VMs to recover.

I've seen absolutely no issues yet with the 4.4 kernel either, but I am
not willing to call that a win because of the quiet from even the
servers on which no tweaks have been applied yet.

Thanks for the heads-up Kevin, appreciated. One thing I need to clarify, though: what kernel was this machine running at the time?

Adi Pircalabu
_______________________________________________
CentOS-virt mailing list
CentOS-virt@xxxxxxxxxx
https://lists.centos.org/mailman/listinfo/centos-virt



[Index of Archives]     [CentOS Users]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [X.org]     [Xfree86]     [Linux USB]

  Powered by Linux