poor TCP network performance using CX-5 cards starting from 4.16 kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi folks,

Please excuse me if this is not the right place to post this (but
according to the MAINTAINERS list for the ethernet Mellanox MLX5
driver this should be it). I'm having issues with the latest kernel
(4.18-rc4) and using Mellanox CX-5 cards. I'm trying to send TCP
traffic not RDMA traffic using iperf3. I've noticed issues with
performance.

Below are some data points I have. I have tested some kernels in
between 4.15-rc4 and 4.18-rc4. What I notice is that I have asymmetric
flow performance where "good" direction gets 20+G and the bad
direction gets 2+G. I measure performance over doing multiple runs
(10). I have noticed that while 4.15-rc4 was not perfect at getting
symmetric 20+G performance all the time (details below), the poor
performance is more prevalent starting from 4.16 kernel.

In 4.15-rc4 6 out of 10 runs show good performance 20+G (in the bad
direction). performance in other direction is mostly 28+G (7 out of 10
runs. where 3 runs it goes down to 15+G)
In 4.16 3 out of 10 runs show good performance 20+G. performance in
other direction is mostly 20+G (7 out of 10 runs where 3 runs goes
down to 15G)
In 4.17 0 out of 10 runs show good performance. performance in other
direction is mostly 20+G (7 out of 10 runs where 3 runs it goes down
to 5G)
In 4.18-rc4 0 out of 10 runs show good performance in the "bad
direction". the "good direction" is now also pretty bad where 7 out of
10 runs had throughput between 7-10G and 3 runs had 11-17G.

What I accidentally found was that if CPUs is busy (ie doing a kernel
compile) while running a test, it improves bad performance raising it
from 2G to about 15G.

ethtool is logging tx global pause frames on the interface on the
server side of the bad direction. no pause frames are logged on the
other machine.

It looks like some kind of code change went into 4.16 that is killing
performance?

I'm not sure what other information that I might provide that would be
useful. Two machines are identical in terms of CPUs, memory, CX-5
cards, kernel. CX-5 cards have latest firmware. Some testing was done
by connecting the machines back to back to remove the possibility the
switch being the problem. But the present numbers here are from a
switched configuration.

Here's some info about the CPU
processor       : 23
vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
stepping        : 7
microcode       : 0x710
cpu MHz         : 2408.246
cache size      : 15360 KB

Thank you.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux