On 14/10/20 08:56, Federico Parola wrote:
I was able to scale up to 4 cores by reducing the size of the rx ring from 512 to 128 withThanks for your help! On 13/10/20 18:44, Toke Høiland-Jørgensen wrote:I forgot to mention, I have manually configured the IRQ affinity to map every queue on a different core, and running your command confirms that one core per queue/flow is used.Federico Parola<fede.parola@xxxxxxxxxx> writes:Hello, I'm testing the performance of XDP when dropping packets using multiple cores and I'm getting unexpected results. My machine is equipped with a dual port Intel XL710 40 GbE and an Intel Xeon Gold 5120 CPU @ 2.20GHz with 14 cores (HyperThreading disabled), running Ubuntu server 18.04 with kernel 5.8.12. I'm using the xdp_rxq_info program from the kernel tree samples to drop packets. I generate 64 bytes UDP packets with MoonGen for a total of 42 Mpps. Packets are uniformly distributed in different flows (different src port) and I use flow direction rules on the rx NIC to send these flows to different queues/cores. Here are my results: 1 FLOW:Running XDP on dev:enp101s0f0 (ifindex:3) action:XDP_DROP options:no_touchXDP stats CPU pps issue-pps XDP-RX CPU 0 17784270 0 XDP-RX CPU total 17784270 RXQ stats RXQ:CPU pps issue-pps rx_queue_index 0:0 17784270 0 rx_queue_index 0:sum 17784270 --- 2 FLOWS:Running XDP on dev:enp101s0f0 (ifindex:3) action:XDP_DROP options:no_touchXDP stats CPU pps issue-pps XDP-RX CPU 0 7016363 0 XDP-RX CPU 1 7017291 0 XDP-RX CPU total 14033655 RXQ stats RXQ:CPU pps issue-pps rx_queue_index 0:0 7016366 0 rx_queue_index 0:sum 7016366 rx_queue_index 1:1 7017294 0 rx_queue_index 1:sum 7017294 --- 4 FLOWS:Running XDP on dev:enp101s0f0 (ifindex:3) action:XDP_DROP options:no_touchXDP stats CPU pps issue-pps XDP-RX CPU 0 2359478 0 XDP-RX CPU 1 2358508 0 XDP-RX CPU 2 2357042 0 XDP-RX CPU 3 2355396 0 XDP-RX CPU total 9430425 RXQ stats RXQ:CPU pps issue-pps rx_queue_index 0:0 2359474 0 rx_queue_index 0:sum 2359474 rx_queue_index 1:1 2358504 0 rx_queue_index 1:sum 2358504 rx_queue_index 2:2 2357040 0 rx_queue_index 2:sum 2357040 rx_queue_index 3:3 2355392 0 rx_queue_index 3:sum 2355392 I don't understand why overall performance is reducing with the number of cores, according to [1] I would expect it to increase until reaching a maximum value. Is there any parameter I should tune to overcome the problem?Yeah, this does look a bit odd. My immediate thought is that maybe your RXQs are not pinned to the cores correctly? There is nothing in xdp_rxq_info that ensures this, you have to configure the IRQ affinity manually. If you don't do this, I suppose the processing could be bouncing around on different CPUs leading to cache line contention when updating the stats map. You can try to look at what the actual CPU load is on each core - 'mpstat -P ALL -n 1' is my goto for this. -TokeOn 13/10/20 18:41, Jesper Dangaard Brouer wrote:This is what I see with i40e: unning XDP on dev:i40e2 (ifindex:6) action:XDP_DROP options:no_touch XDP stats CPU pps issue-pps XDP-RX CPU 1 8,411,547 0 XDP-RX CPU 2 2,804,016 0 XDP-RX CPU 3 2,803,600 0 XDP-RX CPU 4 5,608,380 0 XDP-RX CPU 5 13,999,125 0 XDP-RX CPU total 33,626,671 RXQ stats RXQ:CPU pps issue-pps rx_queue_index 0:3 2,803,600 0 rx_queue_index 0:sum 2,803,600 rx_queue_index 1:1 8,411,540 0 rx_queue_index 1:sum 8,411,540 rx_queue_index 2:2 2,804,015 0 rx_queue_index 2:sum 2,804,015 rx_queue_index 3:5 8,399,326 0 rx_queue_index 3:sum 8,399,326 rx_queue_index 4:4 5,608,372 0 rx_queue_index 4:sum 5,608,372 rx_queue_index 5:5 5,599,809 0 rx_queue_index 5:sum 5,599,809 That is strange, as my results above show that it does scale on my testlab on same NIC i40e (Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02)). Can you try to use this[2] tool: ethtool_stats.pl --dev enp101s0f0 And notice if there are any strange counters.[2]https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.plMy best guess is that you have Ethernet flow-control enabled. Some ethtool counter might show if that is the case.Here are the results of the tool: 1 FLOW: Show adapter(s) (enp101s0f0) statistics (ONLY that changed!)Ethtool(enp101s0f0) stat: 35458700 ( 35,458,700) <= port.fdir_sb_match /sec Ethtool(enp101s0f0) stat: 2729223958 ( 2,729,223,958) <= port.rx_bytes /sec Ethtool(enp101s0f0) stat: 7185397 ( 7,185,397) <= port.rx_dropped /sec Ethtool(enp101s0f0) stat: 42644155 ( 42,644,155) <= port.rx_size_64 /sec Ethtool(enp101s0f0) stat: 42644140 ( 42,644,140) <= port.rx_unicast /sec Ethtool(enp101s0f0) stat: 1062159456 ( 1,062,159,456) <= rx-0.bytes /sec Ethtool(enp101s0f0) stat: 17702658 ( 17,702,658) <= rx-0.packets /secEthtool(enp101s0f0) stat: 1062155639 ( 1,062,155,639) <= rx_bytes /secEthtool(enp101s0f0) stat: 17756128 ( 17,756,128) <= rx_dropped /sec Ethtool(enp101s0f0) stat: 17702594 ( 17,702,594) <= rx_packets /sec Ethtool(enp101s0f0) stat: 35458743 ( 35,458,743) <= rx_unicast /sec--- 4 FLOWS: Show adapter(s) (enp101s0f0) statistics (ONLY that changed!)Ethtool(enp101s0f0) stat: 9351001 ( 9,351,001) <= port.fdir_sb_match /sec Ethtool(enp101s0f0) stat: 2559136358 ( 2,559,136,358) <= port.rx_bytes /sec Ethtool(enp101s0f0) stat: 30635346 ( 30,635,346) <= port.rx_dropped /sec Ethtool(enp101s0f0) stat: 39986386 ( 39,986,386) <= port.rx_size_64 /sec Ethtool(enp101s0f0) stat: 39986799 ( 39,986,799) <= port.rx_unicast /sec Ethtool(enp101s0f0) stat: 140177834 ( 140,177,834) <= rx-0.bytes /sec Ethtool(enp101s0f0) stat: 2336297 ( 2,336,297) <= rx-0.packets /sec Ethtool(enp101s0f0) stat: 140260002 ( 140,260,002) <= rx-1.bytes /sec Ethtool(enp101s0f0) stat: 2337667 ( 2,337,667) <= rx-1.packets /sec Ethtool(enp101s0f0) stat: 140261431 ( 140,261,431) <= rx-2.bytes /sec Ethtool(enp101s0f0) stat: 2337691 ( 2,337,691) <= rx-2.packets /sec Ethtool(enp101s0f0) stat: 140175690 ( 140,175,690) <= rx-3.bytes /sec Ethtool(enp101s0f0) stat: 2336262 ( 2,336,262) <= rx-3.packets /secEthtool(enp101s0f0) stat: 560877338 ( 560,877,338) <= rx_bytes /secEthtool(enp101s0f0) stat: 3354 ( 3,354) <= rx_dropped /sec Ethtool(enp101s0f0) stat: 9347956 ( 9,347,956) <= rx_packets /sec Ethtool(enp101s0f0) stat: 9351183 ( 9,351,183) <= rx_unicast /secSo if I understand the field port.rx_dropped represents packets dropped due to a lack of buffer on the NIC while rx_dropped represents packets dropped because upper layers aren't able to process them, am I right?It seems that the problem is in the NIC. Federico
sudo ethtool -G enp101s0f0 rx 128 (Why reducing? I would expect an increase to help)However the problem persists when exceeding 4 flows/cores, and a further reduction of the size of the ring doesn't help.
4 FLOWS: Running XDP on dev:enp101s0f0 (ifindex:4) action:XDP_DROP options:no_touch XDP stats CPU pps issue-pps XDP-RX CPU 0 9841972 0 XDP-RX CPU 1 9842098 0 XDP-RX CPU 2 9842010 0 XDP-RX CPU 3 9842301 0 XDP-RX CPU total 39368383 --- 6 FLOWS: Running XDP on dev:enp101s0f0 (ifindex:4) action:XDP_DROP options:no_touch XDP stats CPU pps issue-pps XDP-RX CPU 0 4470754 0 XDP-RX CPU 1 4470224 0 XDP-RX CPU 2 4468194 0 XDP-RX CPU 3 4470562 0 XDP-RX CPU 4 4470316 0 XDP-RX CPU 5 4467888 0 XDP-RX CPU total 26817942 Federico