Hi folks, Please excuse me if this is not the right place to post this (but according to the MAINTAINERS list for the ethernet Mellanox MLX5 driver this should be it). I'm having issues with the latest kernel (4.18-rc4) and using Mellanox CX-5 cards. I'm trying to send TCP traffic not RDMA traffic using iperf3. I've noticed issues with performance. Below are some data points I have. I have tested some kernels in between 4.15-rc4 and 4.18-rc4. What I notice is that I have asymmetric flow performance where "good" direction gets 20+G and the bad direction gets 2+G. I measure performance over doing multiple runs (10). I have noticed that while 4.15-rc4 was not perfect at getting symmetric 20+G performance all the time (details below), the poor performance is more prevalent starting from 4.16 kernel. In 4.15-rc4 6 out of 10 runs show good performance 20+G (in the bad direction). performance in other direction is mostly 28+G (7 out of 10 runs. where 3 runs it goes down to 15+G) In 4.16 3 out of 10 runs show good performance 20+G. performance in other direction is mostly 20+G (7 out of 10 runs where 3 runs goes down to 15G) In 4.17 0 out of 10 runs show good performance. performance in other direction is mostly 20+G (7 out of 10 runs where 3 runs it goes down to 5G) In 4.18-rc4 0 out of 10 runs show good performance in the "bad direction". the "good direction" is now also pretty bad where 7 out of 10 runs had throughput between 7-10G and 3 runs had 11-17G. What I accidentally found was that if CPUs is busy (ie doing a kernel compile) while running a test, it improves bad performance raising it from 2G to about 15G. ethtool is logging tx global pause frames on the interface on the server side of the bad direction. no pause frames are logged on the other machine. It looks like some kind of code change went into 4.16 that is killing performance? I'm not sure what other information that I might provide that would be useful. Two machines are identical in terms of CPUs, memory, CX-5 cards, kernel. CX-5 cards have latest firmware. Some testing was done by connecting the machines back to back to remove the possibility the switch being the problem. But the present numbers here are from a switched configuration. Here's some info about the CPU processor : 23 vendor_id : GenuineIntel cpu family : 6 model : 45 model name : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz stepping : 7 microcode : 0x710 cpu MHz : 2408.246 cache size : 15360 KB Thank you. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html