Re: XDP Performance Regression in recent kernel versions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 19 Jun 2024 at 08:00, Tariq Toukan <tariqt@xxxxxxxxxx> wrote:
>
> Thanks for your report.
>
> I assume cpu util for the active core on the DUT is 100% in all cases,
> right?

Yes, that's correct.
The irq is also on the core on the right numa node, and I have
disabled CPU frequency scaling.

>
> Can you please share some more details? Like relevant ethtool counters,
> and perf top output.
>
> We'll check if this repro for us as well.

Sure, below you can find the reports for the XDP_DROP and XDP_TX cases.
I am attaching only the ones for kern v5.15 vs v6.5.

--------------------------------------------------
ethtool output (5.15) - Missing counters are zero
--------------------------------------------------
NIC statistics:
     rx_packets: 333854100
     rx_bytes: 20031246044
     tx_packets: 25
     tx_bytes: 2070
     rx_csum_unnecessary: 333854079
     rx_xdp_drop: 3753342954
     rx_xdp_redirect: 0
     rx_xdp_tx_xmit: 5582660674
     rx_xdp_tx_mpwqe: 175018775
     rx_xdp_tx_inlnw: 8970048
     rx_xdp_tx_nops: 378338337
     rx_xdp_tx_full: 0
     rx_xdp_tx_err: 0
     rx_xdp_tx_cqe: 87229072
     rx_cache_reuse: 9369255040
     rx_cache_full: 68
     rx_cache_empty: 16153471
     rx_cache_busy: 193
     rx_cache_waive: 15864256
     rx_congst_umr: 158
     ch_events: 448
     ch_poll: 151091830
     ch_arm: 301
     rx_out_of_buffer: 990473555
     rx_if_down_packets: 67469721
     rx_steer_missed_packets: 1962570491
     rx_vport_unicast_packets: 38460159194
     rx_vport_unicast_bytes: 2461450188460
     tx_vport_unicast_packets: 5582654212
     tx_vport_unicast_bytes: 334959252764
     tx_packets_phy: 5588396729
     rx_packets_phy: 97052087562
     tx_bytes_phy: 357657403514
     rx_bytes_phy: 6211329423080
     tx_mac_control_phy: 5745055
     tx_pause_ctrl_phy: 5745055
     rx_discards_phy: 58591428329
     tx_discards_phy: 0
     tx_errors_phy: 0
     rx_undersize_pkts_phy: 0
     rx_fragments_phy: 0
     rx_jabbers_phy: 0
     rx_64_bytes_phy: 97052040472
     rx_65_to_127_bytes_phy: 3
     rx_128_to_255_bytes_phy: 0
     rx_256_to_511_bytes_phy: 26
     rx_512_to_1023_bytes_phy: 0
     rx_1024_to_1518_bytes_phy: 0
     rx_1519_to_2047_bytes_phy: 0
     rx_2048_to_4095_bytes_phy: 0
     rx_4096_to_8191_bytes_phy: 0
     rx_8192_to_10239_bytes_phy: 0
     rx_prio0_bytes: 6211318150440
     rx_prio0_packets: 38460533605
     rx_prio0_discards: 58591314012
     tx_prio0_bytes: 357288052986
     tx_prio0_packets: 5582625883
     tx_global_pause: 5745042
     tx_global_pause_duration: 771103810
     ch0_events: 55
     ch0_poll: 146981606
     ch0_arm: 35
     ch0_aff_change: 6
     ch0_force_irq: 0
     ch0_eq_rearm: 0
     rx0_packets: 70812690
     rx0_bytes: 4248761400
     rx0_csum_complete: 0
     rx0_csum_complete_tail: 0
     rx0_csum_complete_tail_slow: 0
     rx0_csum_unnecessary: 70812671
     rx0_csum_unnecessary_inner: 0
     rx0_csum_none: 19
     rx0_xdp_drop: 3753342954
     rx0_xdp_redirect: 0
     rx0_lro_packets: 0
     rx0_lro_bytes: 0
     rx0_ecn_mark: 0
     rx0_removed_vlan_packets: 0
     rx0_wqe_err: 0
     rx0_mpwqe_filler_cqes: 0
     rx0_mpwqe_filler_strides: 0
     rx0_oversize_pkts_sw_drop: 0
     rx0_buff_alloc_err: 0
     rx0_cqe_compress_blks: 0
     rx0_cqe_compress_pkts: 0
     rx0_cache_reuse: 9368316609
     rx0_cache_full: 2
     rx0_cache_empty: 11519
     rx0_cache_busy: 0
     rx0_cache_waive: 0
     rx0_congst_umr: 158
     rx0_arfs_err: 0
     rx0_recover: 0
     rx0_xdp_tx_xmit: 5582664928
     rx0_xdp_tx_mpwqe: 175018908
     rx0_xdp_tx_inlnw: 8970048
     rx0_xdp_tx_nops: 378338623
     rx0_xdp_tx_full: 0
     rx0_xdp_tx_err: 0
     rx0_xdp_tx_cqes: 87229139

--------------------------------------------------
perf top output (5.15) - XDP_DROP
--------------------------------------------------
19.27%  [kernel]                  [k] mlx5e_skb_from_cqe_mpwrq_linear
11.74%  [kernel]                  [k] mlx5e_handle_rx_cqe_mpwrq
9.82%   [kernel]                  [k] mlx5e_xdp_handle
9.43%   [kernel]                  [k] mlx5e_alloc_rx_mpwqe
9.29%   bpf_prog_xdp_basic_prog   [k] bpf_prog_5f76c01f0ff23233_xdp_basic_prog
7.06%   [kernel]                  [k] mlx5e_page_release_dynamic
6.95%   [kernel]                  [k] mlx5e_poll_rx_cq
5.89%   [kernel]                  [k] dma_sync_single_for_cpu
5.21%   [kernel]                  [k] dma_sync_single_for_device
4.12%   [kernel]                  [k] mlx5e_free_rx_mpwqe
1.65%   [kernel]                  [k] mlx5e_poll_ico_cq
1.60%   [kernel]                  [k] mlx5e_napi_poll
1.59%   [kernel]                  [k] bpf_get_smp_processor_id
0.94%   [kernel]                  [k] bpf_dispatcher_xdp_func
0.91%   [kernel]                  [k] net_rx_action
0.90%   bpf_prog_xdp_dispatcher   [k] bpf_prog_17d608957d1f805a_xdp_dispatcher
0.90%   [kernel]                  [k] bpf_dispatcher_xdp
0.64%   [kernel]                  [k] mlx5e_post_rx_mpwqes
0.64%   [kernel]                  [k] mlx5e_poll_xdpsq_cq
0.37%   [kernel]                  [k] __softirqentry_text_start

--------------------------------------------------
perf top output (5.15) - XDP_TX
--------------------------------------------------
13.84%  bpf_prog_xdp_swap_macs_prog  [k]
bpf_prog_0a3ad412f28cbb6d_xdp_swap_macs_prog
11.43%  [kernel]                     [k] mlx5e_xmit_xdp_buff
10.69%  [kernel]                     [k] mlx5e_skb_from_cqe_mpwrq_linear
9.79%  [kernel]                      [k] mlx5e_xmit_xdp_frame_mpwqe
8.35%  [kernel]                      [k] mlx5e_handle_rx_cqe_mpwrq
6.34%  [kernel]                      [k] dma_sync_single_for_device
6.20%  [kernel]                      [k] mlx5e_poll_rx_cq
5.62%  [kernel]                      [k] mlx5e_page_release_dynamic
5.33%  [kernel]                      [k] mlx5e_xdp_handle
5.21%  [kernel]                      [k] mlx5e_alloc_rx_mpwqe
4.47%  [kernel]                      [k] mlx5e_free_xdpsq_desc
3.26%  [kernel]                      [k] dma_sync_single_for_cpu
1.47%  [kernel]                      [k] mlx5e_xmit_xdp_frame_check_mpwqe
1.22%  [kernel]                      [k] mlx5e_poll_xdpsq_cq
0.95%  [kernel]                      [k] net_rx_action
0.90%  [kernel]                      [k] bpf_get_smp_processor_id
0.80%  [kernel]                      [k] mlx5e_napi_poll
0.69%  [kernel]                      [k] mlx5e_xdp_mpwqe_session_start
0.63%  [kernel]                      [k] mlx5e_poll_ico_cq
0.49%  [kernel]                      [k] bpf_dispatcher_xdp
0.47%  [kernel]                      [k] bpf_dispatcher_xdp_func

---------------------------------------------------------------------------------------

--------------------------------------------------
ethtool output (6.5) - Missing counters are zero
--------------------------------------------------
NIC statistics:
     rx_packets: 7282880
     rx_bytes: 436973482
     tx_packets: 42
     tx_bytes: 3556
     rx_csum_unnecessary: 7282816
     rx_xdp_drop: 7783331724
     rx_xdp_redirect: 0
     rx_xdp_tx_xmit: 46956452544
     rx_xdp_tx_mpwqe: 4401807536
     rx_xdp_tx_inlnw: 46951234092
     rx_xdp_tx_nops: 4988835176
     rx_xdp_tx_full: 0
     rx_xdp_tx_err: 0
     rx_xdp_tx_cqe: 733694572
     rx_pp_alloc_fast: 3641784
     rx_pp_alloc_slow: 8
     rx_pp_alloc_slow_high_order: 0
     rx_pp_alloc_empty: 8
     rx_pp_alloc_refill: 0
     rx_pp_alloc_waive: 0
     rx_pp_recycle_cached: 3641280
     ch_events: 505
     ch_poll: 855423286
     rx_out_of_buffer: 534918379
     rx_if_down_packets: 4044804
     rx_steer_missed_packets: 298
     rx_vport_unicast_packets: 287214261626
     rx_vport_unicast_bytes: 18381712744116
     tx_vport_unicast_packets: 46956452544
     tx_vport_unicast_bytes: 2817387157674
     tx_packets_phy: 47000866603
     rx_packets_phy: 728277471186
     tx_bytes_phy: 3008055468662
     rx_bytes_phy: 46609758231313
     tx_mac_control_phy: 44414017
     tx_pause_ctrl_phy: 44414017
     rx_discards_phy: 441063206498
     rx_64_bytes_phy: 728277470842
     rx_65_to_127_bytes_phy: 133
     rx_128_to_255_bytes_phy: 0
     rx_256_to_511_bytes_phy: 211
     rx_512_to_1023_bytes_phy: 0
     rx_1024_to_1518_bytes_phy: 0
     rx_1519_to_2047_bytes_phy: 0
     rx_2048_to_4095_bytes_phy: 0
     rx_4096_to_8191_bytes_phy: 0
     rx_8192_to_10239_bytes_phy: 0
     rx_buffer_passed_thres_phy: 1192226
     rx_prio0_bytes: 46609758231313
     rx_prio0_packets: 287214264688
     rx_prio0_discards: 441063206498
     tx_prio0_bytes: 3005212971574
     tx_prio0_packets: 46956452586
     tx_global_pause: 44414017
     tx_global_pause_duration: 5961284324
     ch0_events: 120
     ch0_poll: 855423025
     ch0_arm: 100
     ch0_aff_change: 0
     ch0_force_irq: 0
     ch0_eq_rearm: 0
     rx0_packets: 7282880
     rx0_bytes: 436973482
     rx0_csum_complete: 0
     rx0_csum_complete_tail: 0
     rx0_csum_complete_tail_slow: 0
     rx0_csum_unnecessary: 7282816
     rx0_csum_unnecessary_inner: 0
     rx0_csum_none: 64
     rx0_xdp_drop: 7783331724
     rx0_xdp_redirect: 0
     rx0_lro_packets: 0
     rx0_lro_bytes: 0
     rx0_gro_packets: 0
     rx0_gro_bytes: 0
     rx0_gro_skbs: 0
     rx0_gro_match_packets: 0
     rx0_gro_large_hds: 0
     rx0_ecn_mark: 0
     rx0_removed_vlan_packets: 0
     rx0_wqe_err: 0
     rx0_mpwqe_filler_cqes: 0
     rx0_mpwqe_filler_strides: 0
     rx0_oversize_pkts_sw_drop: 0
     rx0_buff_alloc_err: 0
     rx0_cqe_compress_blks: 0
     rx0_cqe_compress_pkts: 0
     rx0_congst_umr: 0
     rx0_arfs_err: 0
     rx0_recover: 0
     rx0_pp_alloc_fast: 3641784
     rx0_pp_alloc_slow: 8
     rx0_pp_alloc_slow_high_order: 0
     rx0_pp_alloc_empty: 8
     rx0_pp_alloc_refill: 0
     rx0_pp_alloc_waive: 0
     rx0_pp_recycle_cached: 3641280
     rx0_pp_recycle_cache_full: 0
     rx0_pp_recycle_ring: 0
     rx0_pp_recycle_ring_full: 0
     rx0_pp_recycle_released_ref: 0
     rx0_xdp_tx_xmit: 46956452544
     rx0_xdp_tx_mpwqe: 4401807536
     rx0_xdp_tx_inlnw: 46951234092
     rx0_xdp_tx_nops: 4988835176
     rx0_xdp_tx_full: 0
     rx0_xdp_tx_err: 0
     rx0_xdp_tx_cqes: 733694572

--------------------------------------------------
perf top output (6.5) - XDP_DROP
--------------------------------------------------
27.63%  [kernel]                [k] mlx5e_skb_from_cqe_mpwrq_linear
12.61%  [kernel]                [k] mlx5e_handle_rx_cqe_mpwrq
8.38%  [kernel]                 [k] mlx5e_rx_cq_process_basic_cqe_comp
7.06%  [kernel]                 [k] page_pool_put_defragged_page
6.45%  [kernel]                 [k] mlx5e_xdp_handle
5.36%  bpf_prog_xdp_basic_prog  [k] bpf_prog_5f76c01f0ff23233_xdp_basic_prog
4.95%  [kernel]                 [k] dma_sync_single_for_device
4.89%  [kernel]                 [k] page_pool_alloc_pages
4.36%  [kernel]                 [k] mlx5e_alloc_rx_mpwqe
3.70%  [kernel]                 [k] dma_sync_single_for_cpu
2.71%  [kernel]                 [k] mlx5e_page_release_fragmented.isra.0
2.09%  [kernel]                 [k] bpf_dispatcher_xdp_func
1.95%  [kernel]                 [k] mlx5e_free_rx_mpwqe
1.10%  [kernel]                 [k] mlx5e_poll_ico_cq
1.07%  [kernel]                 [k] bpf_get_smp_processor_id
1.05%  [kernel]                 [k] mlx5e_napi_poll
0.85%  [kernel]                 [k] mlx5e_poll_xdpsq_cq
0.61%  [kernel]                 [k] net_rx_action
0.58%  bpf_prog_xdp_dispatcher  [k] bpf_prog_17d608957d1f805a_xdp_dispatcher
0.57%  [kernel]                 [k] bpf_dispatcher_xdp
0.53%  [kernel]                 [k] mlx5e_post_rx_mpwqes
0.27%  [kernel]                 [k] __do_softirq
0.25%  [kernel]                 [k] mlx5e_poll_tx_cq

--------------------------------------------------
perf top output (6.5) - XDP_TX
--------------------------------------------------
19.60%  [kernel]                    [k] mlx5e_xdp_mpwqe_add_dseg
14.61%  [kernel]                    [k] mlx5e_skb_from_cqe_mpwrq_linear
11.55%  [kernel]                    [k] mlx5e_xmit_xdp_buff
5.85%  [kernel]                     [k] mlx5e_handle_rx_cqe_mpwrq
5.73%  bpf_prog_xdp_swap_macs_prog  [k] bpf_prog_0a3a_xdp_swap_macs_prog
5.09%  [kernel]                     [k] mlx5e_free_xdpsq_desc
5.08%  [kernel]                     [k] dma_sync_single_for_device
4.66%  [kernel]                     [k] mlx5e_xmit_xdp_frame_mpwqe
3.64%  [kernel]                     [k] mlx5e_rx_cq_process_basic_cqe_comp
3.34%  [kernel]                     [k] page_pool_put_defragged_page
3.04%  [kernel]                     [k] mlx5e_xdp_handle
3.03%  [kernel]                     [k] mlx5e_page_release_fragmented.isra.0
2.56%  [kernel]                     [k] dma_sync_single_for_cpu
2.15%  [kernel]                     [k] mlx5e_alloc_rx_mpwqe
1.96%  [kernel]                     [k] page_pool_alloc_pages
1.06%  [kernel]                     [k] mlx5e_xmit_xdp_frame_check_mpwqe
1.02%  [kernel]                     [k] bpf_dispatcher_xdp_func
1.01%  [kernel]                     [k] mlx5e_free_rx_mpwqe
0.84%  [kernel]                     [k] mlx5e_poll_xdpsq_cq
0.62%  [kernel]                     [k] mlx5e_xdpsq_get_next_pi
0.53%  [kernel]                     [k] mlx5e_poll_ico_cq
0.48%  [kernel]                     [k] bpf_get_smp_processor_id
0.48%  [kernel]                     [k] net_rx_action
0.36%  [kernel]                     [k] mlx5e_napi_poll
0.32%  [kernel]                     [k] mlx5e_xdp_mpwqe_complete
0.25%  [kernel]                     [k] bpf_dispatcher_xdp
0.22%  bpf_prog_xdp_dispatcher      [k] bpf_prog_17d6_xdp_dispatcher
0.21%  [kernel]                     [k] mlx5e_post_rx_mpwqes
0.11%  [kernel]                     [k] __do_softirq




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux