Federico Parola <fede.parola@xxxxxxxxxx> writes: > Hello, > I'm testing the performance of XDP when dropping packets using multiple > cores and I'm getting unexpected results. > My machine is equipped with a dual port Intel XL710 40 GbE and an Intel > Xeon Gold 5120 CPU @ 2.20GHz with 14 cores (HyperThreading disabled), > running Ubuntu server 18.04 with kernel 5.8.12. > I'm using the xdp_rxq_info program from the kernel tree samples to drop > packets. > I generate 64 bytes UDP packets with MoonGen for a total of 42 Mpps. > Packets are uniformly distributed in different flows (different src > port) and I use flow direction rules on the rx NIC to send these flows > to different queues/cores. > Here are my results: > > 1 FLOW: > Running XDP on dev:enp101s0f0 (ifindex:3) action:XDP_DROP options:no_touch > XDP stats CPU pps issue-pps > XDP-RX CPU 0 17784270 0 > XDP-RX CPU total 17784270 > > RXQ stats RXQ:CPU pps issue-pps > rx_queue_index 0:0 17784270 0 > rx_queue_index 0:sum 17784270 > --- > > 2 FLOWS: > Running XDP on dev:enp101s0f0 (ifindex:3) action:XDP_DROP options:no_touch > XDP stats CPU pps issue-pps > XDP-RX CPU 0 7016363 0 > XDP-RX CPU 1 7017291 0 > XDP-RX CPU total 14033655 > > RXQ stats RXQ:CPU pps issue-pps > rx_queue_index 0:0 7016366 0 > rx_queue_index 0:sum 7016366 > rx_queue_index 1:1 7017294 0 > rx_queue_index 1:sum 7017294 > --- > > 4 FLOWS: > Running XDP on dev:enp101s0f0 (ifindex:3) action:XDP_DROP options:no_touch > XDP stats CPU pps issue-pps > XDP-RX CPU 0 2359478 0 > XDP-RX CPU 1 2358508 0 > XDP-RX CPU 2 2357042 0 > XDP-RX CPU 3 2355396 0 > XDP-RX CPU total 9430425 > > RXQ stats RXQ:CPU pps issue-pps > rx_queue_index 0:0 2359474 0 > rx_queue_index 0:sum 2359474 > rx_queue_index 1:1 2358504 0 > rx_queue_index 1:sum 2358504 > rx_queue_index 2:2 2357040 0 > rx_queue_index 2:sum 2357040 > rx_queue_index 3:3 2355392 0 > rx_queue_index 3:sum 2355392 > > I don't understand why overall performance is reducing with the number > of cores, according to [1] I would expect it to increase until reaching > a maximum value. Is there any parameter I should tune to overcome the > problem? Yeah, this does look a bit odd. My immediate thought is that maybe your RXQs are not pinned to the cores correctly? There is nothing in xdp_rxq_info that ensures this, you have to configure the IRQ affinity manually. If you don't do this, I suppose the processing could be bouncing around on different CPUs leading to cache line contention when updating the stats map. You can try to look at what the actual CPU load is on each core - 'mpstat -P ALL -n 1' is my goto for this. -Toke