On 14/10/20 11:15, Jesper Dangaard Brouer wrote:
Yes, I'm using flow filters to manually steer traffic to different queues/cores, however disabling ntuple doesn't solve the problem, the port.fdir_sb_match value disappears but the number of packets dropped in port.rx_dropped stays high.On Wed, 14 Oct 2020 08:56:43 +0200 Federico Parola <fede.parola@xxxxxxxxxx> wrote: [...]Can you try to use this[2] tool: ethtool_stats.pl --dev enp101s0f0 And notice if there are any strange counters. [2]https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl My best guess is that you have Ethernet flow-control enabled. Some ethtool counter might show if that is the case.Here are the results of the tool: 1 FLOW: Show adapter(s) (enp101s0f0) statistics (ONLY that changed!) Ethtool(enp101s0f0) stat: 35458700 ( 35,458,700) <= port.fdir_sb_match /sec Ethtool(enp101s0f0) stat: 2729223958 ( 2,729,223,958) <= port.rx_bytes /sec Ethtool(enp101s0f0) stat: 7185397 ( 7,185,397) <= port.rx_dropped /sec Ethtool(enp101s0f0) stat: 42644155 ( 42,644,155) <= port.rx_size_64 /sec Ethtool(enp101s0f0) stat: 42644140 ( 42,644,140) <= port.rx_unicast /sec Ethtool(enp101s0f0) stat: 1062159456 ( 1,062,159,456) <= rx-0.bytes /sec Ethtool(enp101s0f0) stat: 17702658 ( 17,702,658) <= rx-0.packets /sec Ethtool(enp101s0f0) stat: 1062155639 ( 1,062,155,639) <= rx_bytes /sec Ethtool(enp101s0f0) stat: 17756128 ( 17,756,128) <= rx_dropped /sec Ethtool(enp101s0f0) stat: 17702594 ( 17,702,594) <= rx_packets /sec Ethtool(enp101s0f0) stat: 35458743 ( 35,458,743) <= rx_unicast /sec --- 4 FLOWS: Show adapter(s) (enp101s0f0) statistics (ONLY that changed!) Ethtool(enp101s0f0) stat: 9351001 ( 9,351,001) <= port.fdir_sb_match /sec Ethtool(enp101s0f0) stat: 2559136358 ( 2,559,136,358) <= port.rx_bytes /sec Ethtool(enp101s0f0) stat: 30635346 ( 30,635,346) <= port.rx_dropped /sec Ethtool(enp101s0f0) stat: 39986386 ( 39,986,386) <= port.rx_size_64 /sec Ethtool(enp101s0f0) stat: 39986799 ( 39,986,799) <= port.rx_unicast /sec Ethtool(enp101s0f0) stat: 140177834 ( 140,177,834) <= rx-0.bytes /sec Ethtool(enp101s0f0) stat: 2336297 ( 2,336,297) <= rx-0.packets /sec Ethtool(enp101s0f0) stat: 140260002 ( 140,260,002) <= rx-1.bytes /sec Ethtool(enp101s0f0) stat: 2337667 ( 2,337,667) <= rx-1.packets /sec Ethtool(enp101s0f0) stat: 140261431 ( 140,261,431) <= rx-2.bytes /sec Ethtool(enp101s0f0) stat: 2337691 ( 2,337,691) <= rx-2.packets /sec Ethtool(enp101s0f0) stat: 140175690 ( 140,175,690) <= rx-3.bytes /sec Ethtool(enp101s0f0) stat: 2336262 ( 2,336,262) <= rx-3.packets /sec Ethtool(enp101s0f0) stat: 560877338 ( 560,877,338) <= rx_bytes /sec Ethtool(enp101s0f0) stat: 3354 ( 3,354) <= rx_dropped /sec Ethtool(enp101s0f0) stat: 9347956 ( 9,347,956) <= rx_packets /sec Ethtool(enp101s0f0) stat: 9351183 ( 9,351,183) <= rx_unicast /sec So if I understand the field port.rx_dropped represents packets dropped due to a lack of buffer on the NIC while rx_dropped represents packets dropped because upper layers aren't able to process them, am I right? It seems that the problem is in the NIC.Yes, it seems that the problem is in the NIC hardware, or config of the NIC hardware. Look at the counter "port.fdir_sb_match": - 1 flow: 35,458,700 = port.fdir_sb_match /sec - 4 flow: 9,351,001 = port.fdir_sb_match /sec I think fdir_sb translates to Flow Director Sideband filter (in the driver code this is sometimes related to "ATR" (Application Targeted Routing)). (note: I've seen fdir_match before, but not the "sb" fdir_sb_match part). This is happening inside the NIC HW/FW that does filtering on flows and make sure same-flow goes to same RX-queue number to avoid OOO packets. This looks like the limiting factor in your setup. Have you installed any filters yourself? Try to disable Flow Director: ethtool -K ethX ntuple <on|off>
The only solution I've found so far is to reduce the size of the rx ring as I mentioned in my former post. However I still see a decrease in performance when exceeding 4 cores.
Federico