Re: Multi-core scalability problems

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> · Tue, 13 Oct 2020 18:44:49 +0200

Federico Parola <fede.parola@xxxxxxxxxx> writes:

> Hello,
> I'm testing the performance of XDP when dropping packets using multiple 
> cores and I'm getting unexpected results.
> My machine is equipped with a dual port Intel XL710 40 GbE and an Intel 
> Xeon Gold 5120 CPU @ 2.20GHz with 14 cores (HyperThreading disabled), 
> running Ubuntu server 18.04 with kernel 5.8.12.
> I'm using the xdp_rxq_info program from the kernel tree samples to drop 
> packets.
> I generate 64 bytes UDP packets with MoonGen for a total of 42 Mpps. 
> Packets are uniformly distributed in different flows (different src 
> port) and I use flow direction rules on the rx NIC to send these flows 
> to different queues/cores.
> Here are my results:
>
> 1 FLOW:
> Running XDP on dev:enp101s0f0 (ifindex:3) action:XDP_DROP options:no_touch
> XDP stats       CPU     pps         issue-pps
> XDP-RX CPU      0       17784270    0
> XDP-RX CPU      total   17784270
>
> RXQ stats       RXQ:CPU pps         issue-pps
> rx_queue_index    0:0   17784270    0
> rx_queue_index    0:sum 17784270
> ---
>
> 2 FLOWS:
> Running XDP on dev:enp101s0f0 (ifindex:3) action:XDP_DROP options:no_touch
> XDP stats       CPU     pps         issue-pps
> XDP-RX CPU      0       7016363     0
> XDP-RX CPU      1       7017291     0
> XDP-RX CPU      total   14033655
>
> RXQ stats       RXQ:CPU pps         issue-pps
> rx_queue_index    0:0   7016366     0
> rx_queue_index    0:sum 7016366
> rx_queue_index    1:1   7017294     0
> rx_queue_index    1:sum 7017294
> ---
>
> 4 FLOWS:
> Running XDP on dev:enp101s0f0 (ifindex:3) action:XDP_DROP options:no_touch
> XDP stats       CPU     pps         issue-pps
> XDP-RX CPU      0       2359478     0
> XDP-RX CPU      1       2358508     0
> XDP-RX CPU      2       2357042     0
> XDP-RX CPU      3       2355396     0
> XDP-RX CPU      total   9430425
>
> RXQ stats       RXQ:CPU pps         issue-pps
> rx_queue_index    0:0   2359474     0
> rx_queue_index    0:sum 2359474
> rx_queue_index    1:1   2358504     0
> rx_queue_index    1:sum 2358504
> rx_queue_index    2:2   2357040     0
> rx_queue_index    2:sum 2357040
> rx_queue_index    3:3   2355392     0
> rx_queue_index    3:sum 2355392
>
> I don't understand why overall performance is reducing with the number 
> of cores, according to [1] I would expect it to increase until reaching 
> a maximum value. Is there any parameter I should tune to overcome the 
> problem?

Yeah, this does look a bit odd. My immediate thought is that maybe your
RXQs are not pinned to the cores correctly? There is nothing in
xdp_rxq_info that ensures this, you have to configure the IRQ affinity
manually. If you don't do this, I suppose the processing could be
bouncing around on different CPUs leading to cache line contention when
updating the stats map.

You can try to look at what the actual CPU load is on each core -
'mpstat -P ALL -n 1' is my goto for this.

-Toke