On Wed, 19 Jun 2024 at 08:00, Tariq Toukan <tariqt@xxxxxxxxxx> wrote: > > Thanks for your report. > > I assume cpu util for the active core on the DUT is 100% in all cases, > right? Yes, that's correct. The irq is also on the core on the right numa node, and I have disabled CPU frequency scaling. > > Can you please share some more details? Like relevant ethtool counters, > and perf top output. > > We'll check if this repro for us as well. Sure, below you can find the reports for the XDP_DROP and XDP_TX cases. I am attaching only the ones for kern v5.15 vs v6.5. -------------------------------------------------- ethtool output (5.15) - Missing counters are zero -------------------------------------------------- NIC statistics: rx_packets: 333854100 rx_bytes: 20031246044 tx_packets: 25 tx_bytes: 2070 rx_csum_unnecessary: 333854079 rx_xdp_drop: 3753342954 rx_xdp_redirect: 0 rx_xdp_tx_xmit: 5582660674 rx_xdp_tx_mpwqe: 175018775 rx_xdp_tx_inlnw: 8970048 rx_xdp_tx_nops: 378338337 rx_xdp_tx_full: 0 rx_xdp_tx_err: 0 rx_xdp_tx_cqe: 87229072 rx_cache_reuse: 9369255040 rx_cache_full: 68 rx_cache_empty: 16153471 rx_cache_busy: 193 rx_cache_waive: 15864256 rx_congst_umr: 158 ch_events: 448 ch_poll: 151091830 ch_arm: 301 rx_out_of_buffer: 990473555 rx_if_down_packets: 67469721 rx_steer_missed_packets: 1962570491 rx_vport_unicast_packets: 38460159194 rx_vport_unicast_bytes: 2461450188460 tx_vport_unicast_packets: 5582654212 tx_vport_unicast_bytes: 334959252764 tx_packets_phy: 5588396729 rx_packets_phy: 97052087562 tx_bytes_phy: 357657403514 rx_bytes_phy: 6211329423080 tx_mac_control_phy: 5745055 tx_pause_ctrl_phy: 5745055 rx_discards_phy: 58591428329 tx_discards_phy: 0 tx_errors_phy: 0 rx_undersize_pkts_phy: 0 rx_fragments_phy: 0 rx_jabbers_phy: 0 rx_64_bytes_phy: 97052040472 rx_65_to_127_bytes_phy: 3 rx_128_to_255_bytes_phy: 0 rx_256_to_511_bytes_phy: 26 rx_512_to_1023_bytes_phy: 0 rx_1024_to_1518_bytes_phy: 0 rx_1519_to_2047_bytes_phy: 0 rx_2048_to_4095_bytes_phy: 0 rx_4096_to_8191_bytes_phy: 0 rx_8192_to_10239_bytes_phy: 0 rx_prio0_bytes: 6211318150440 rx_prio0_packets: 38460533605 rx_prio0_discards: 58591314012 tx_prio0_bytes: 357288052986 tx_prio0_packets: 5582625883 tx_global_pause: 5745042 tx_global_pause_duration: 771103810 ch0_events: 55 ch0_poll: 146981606 ch0_arm: 35 ch0_aff_change: 6 ch0_force_irq: 0 ch0_eq_rearm: 0 rx0_packets: 70812690 rx0_bytes: 4248761400 rx0_csum_complete: 0 rx0_csum_complete_tail: 0 rx0_csum_complete_tail_slow: 0 rx0_csum_unnecessary: 70812671 rx0_csum_unnecessary_inner: 0 rx0_csum_none: 19 rx0_xdp_drop: 3753342954 rx0_xdp_redirect: 0 rx0_lro_packets: 0 rx0_lro_bytes: 0 rx0_ecn_mark: 0 rx0_removed_vlan_packets: 0 rx0_wqe_err: 0 rx0_mpwqe_filler_cqes: 0 rx0_mpwqe_filler_strides: 0 rx0_oversize_pkts_sw_drop: 0 rx0_buff_alloc_err: 0 rx0_cqe_compress_blks: 0 rx0_cqe_compress_pkts: 0 rx0_cache_reuse: 9368316609 rx0_cache_full: 2 rx0_cache_empty: 11519 rx0_cache_busy: 0 rx0_cache_waive: 0 rx0_congst_umr: 158 rx0_arfs_err: 0 rx0_recover: 0 rx0_xdp_tx_xmit: 5582664928 rx0_xdp_tx_mpwqe: 175018908 rx0_xdp_tx_inlnw: 8970048 rx0_xdp_tx_nops: 378338623 rx0_xdp_tx_full: 0 rx0_xdp_tx_err: 0 rx0_xdp_tx_cqes: 87229139 -------------------------------------------------- perf top output (5.15) - XDP_DROP -------------------------------------------------- 19.27% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear 11.74% [kernel] [k] mlx5e_handle_rx_cqe_mpwrq 9.82% [kernel] [k] mlx5e_xdp_handle 9.43% [kernel] [k] mlx5e_alloc_rx_mpwqe 9.29% bpf_prog_xdp_basic_prog [k] bpf_prog_5f76c01f0ff23233_xdp_basic_prog 7.06% [kernel] [k] mlx5e_page_release_dynamic 6.95% [kernel] [k] mlx5e_poll_rx_cq 5.89% [kernel] [k] dma_sync_single_for_cpu 5.21% [kernel] [k] dma_sync_single_for_device 4.12% [kernel] [k] mlx5e_free_rx_mpwqe 1.65% [kernel] [k] mlx5e_poll_ico_cq 1.60% [kernel] [k] mlx5e_napi_poll 1.59% [kernel] [k] bpf_get_smp_processor_id 0.94% [kernel] [k] bpf_dispatcher_xdp_func 0.91% [kernel] [k] net_rx_action 0.90% bpf_prog_xdp_dispatcher [k] bpf_prog_17d608957d1f805a_xdp_dispatcher 0.90% [kernel] [k] bpf_dispatcher_xdp 0.64% [kernel] [k] mlx5e_post_rx_mpwqes 0.64% [kernel] [k] mlx5e_poll_xdpsq_cq 0.37% [kernel] [k] __softirqentry_text_start -------------------------------------------------- perf top output (5.15) - XDP_TX -------------------------------------------------- 13.84% bpf_prog_xdp_swap_macs_prog [k] bpf_prog_0a3ad412f28cbb6d_xdp_swap_macs_prog 11.43% [kernel] [k] mlx5e_xmit_xdp_buff 10.69% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear 9.79% [kernel] [k] mlx5e_xmit_xdp_frame_mpwqe 8.35% [kernel] [k] mlx5e_handle_rx_cqe_mpwrq 6.34% [kernel] [k] dma_sync_single_for_device 6.20% [kernel] [k] mlx5e_poll_rx_cq 5.62% [kernel] [k] mlx5e_page_release_dynamic 5.33% [kernel] [k] mlx5e_xdp_handle 5.21% [kernel] [k] mlx5e_alloc_rx_mpwqe 4.47% [kernel] [k] mlx5e_free_xdpsq_desc 3.26% [kernel] [k] dma_sync_single_for_cpu 1.47% [kernel] [k] mlx5e_xmit_xdp_frame_check_mpwqe 1.22% [kernel] [k] mlx5e_poll_xdpsq_cq 0.95% [kernel] [k] net_rx_action 0.90% [kernel] [k] bpf_get_smp_processor_id 0.80% [kernel] [k] mlx5e_napi_poll 0.69% [kernel] [k] mlx5e_xdp_mpwqe_session_start 0.63% [kernel] [k] mlx5e_poll_ico_cq 0.49% [kernel] [k] bpf_dispatcher_xdp 0.47% [kernel] [k] bpf_dispatcher_xdp_func --------------------------------------------------------------------------------------- -------------------------------------------------- ethtool output (6.5) - Missing counters are zero -------------------------------------------------- NIC statistics: rx_packets: 7282880 rx_bytes: 436973482 tx_packets: 42 tx_bytes: 3556 rx_csum_unnecessary: 7282816 rx_xdp_drop: 7783331724 rx_xdp_redirect: 0 rx_xdp_tx_xmit: 46956452544 rx_xdp_tx_mpwqe: 4401807536 rx_xdp_tx_inlnw: 46951234092 rx_xdp_tx_nops: 4988835176 rx_xdp_tx_full: 0 rx_xdp_tx_err: 0 rx_xdp_tx_cqe: 733694572 rx_pp_alloc_fast: 3641784 rx_pp_alloc_slow: 8 rx_pp_alloc_slow_high_order: 0 rx_pp_alloc_empty: 8 rx_pp_alloc_refill: 0 rx_pp_alloc_waive: 0 rx_pp_recycle_cached: 3641280 ch_events: 505 ch_poll: 855423286 rx_out_of_buffer: 534918379 rx_if_down_packets: 4044804 rx_steer_missed_packets: 298 rx_vport_unicast_packets: 287214261626 rx_vport_unicast_bytes: 18381712744116 tx_vport_unicast_packets: 46956452544 tx_vport_unicast_bytes: 2817387157674 tx_packets_phy: 47000866603 rx_packets_phy: 728277471186 tx_bytes_phy: 3008055468662 rx_bytes_phy: 46609758231313 tx_mac_control_phy: 44414017 tx_pause_ctrl_phy: 44414017 rx_discards_phy: 441063206498 rx_64_bytes_phy: 728277470842 rx_65_to_127_bytes_phy: 133 rx_128_to_255_bytes_phy: 0 rx_256_to_511_bytes_phy: 211 rx_512_to_1023_bytes_phy: 0 rx_1024_to_1518_bytes_phy: 0 rx_1519_to_2047_bytes_phy: 0 rx_2048_to_4095_bytes_phy: 0 rx_4096_to_8191_bytes_phy: 0 rx_8192_to_10239_bytes_phy: 0 rx_buffer_passed_thres_phy: 1192226 rx_prio0_bytes: 46609758231313 rx_prio0_packets: 287214264688 rx_prio0_discards: 441063206498 tx_prio0_bytes: 3005212971574 tx_prio0_packets: 46956452586 tx_global_pause: 44414017 tx_global_pause_duration: 5961284324 ch0_events: 120 ch0_poll: 855423025 ch0_arm: 100 ch0_aff_change: 0 ch0_force_irq: 0 ch0_eq_rearm: 0 rx0_packets: 7282880 rx0_bytes: 436973482 rx0_csum_complete: 0 rx0_csum_complete_tail: 0 rx0_csum_complete_tail_slow: 0 rx0_csum_unnecessary: 7282816 rx0_csum_unnecessary_inner: 0 rx0_csum_none: 64 rx0_xdp_drop: 7783331724 rx0_xdp_redirect: 0 rx0_lro_packets: 0 rx0_lro_bytes: 0 rx0_gro_packets: 0 rx0_gro_bytes: 0 rx0_gro_skbs: 0 rx0_gro_match_packets: 0 rx0_gro_large_hds: 0 rx0_ecn_mark: 0 rx0_removed_vlan_packets: 0 rx0_wqe_err: 0 rx0_mpwqe_filler_cqes: 0 rx0_mpwqe_filler_strides: 0 rx0_oversize_pkts_sw_drop: 0 rx0_buff_alloc_err: 0 rx0_cqe_compress_blks: 0 rx0_cqe_compress_pkts: 0 rx0_congst_umr: 0 rx0_arfs_err: 0 rx0_recover: 0 rx0_pp_alloc_fast: 3641784 rx0_pp_alloc_slow: 8 rx0_pp_alloc_slow_high_order: 0 rx0_pp_alloc_empty: 8 rx0_pp_alloc_refill: 0 rx0_pp_alloc_waive: 0 rx0_pp_recycle_cached: 3641280 rx0_pp_recycle_cache_full: 0 rx0_pp_recycle_ring: 0 rx0_pp_recycle_ring_full: 0 rx0_pp_recycle_released_ref: 0 rx0_xdp_tx_xmit: 46956452544 rx0_xdp_tx_mpwqe: 4401807536 rx0_xdp_tx_inlnw: 46951234092 rx0_xdp_tx_nops: 4988835176 rx0_xdp_tx_full: 0 rx0_xdp_tx_err: 0 rx0_xdp_tx_cqes: 733694572 -------------------------------------------------- perf top output (6.5) - XDP_DROP -------------------------------------------------- 27.63% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear 12.61% [kernel] [k] mlx5e_handle_rx_cqe_mpwrq 8.38% [kernel] [k] mlx5e_rx_cq_process_basic_cqe_comp 7.06% [kernel] [k] page_pool_put_defragged_page 6.45% [kernel] [k] mlx5e_xdp_handle 5.36% bpf_prog_xdp_basic_prog [k] bpf_prog_5f76c01f0ff23233_xdp_basic_prog 4.95% [kernel] [k] dma_sync_single_for_device 4.89% [kernel] [k] page_pool_alloc_pages 4.36% [kernel] [k] mlx5e_alloc_rx_mpwqe 3.70% [kernel] [k] dma_sync_single_for_cpu 2.71% [kernel] [k] mlx5e_page_release_fragmented.isra.0 2.09% [kernel] [k] bpf_dispatcher_xdp_func 1.95% [kernel] [k] mlx5e_free_rx_mpwqe 1.10% [kernel] [k] mlx5e_poll_ico_cq 1.07% [kernel] [k] bpf_get_smp_processor_id 1.05% [kernel] [k] mlx5e_napi_poll 0.85% [kernel] [k] mlx5e_poll_xdpsq_cq 0.61% [kernel] [k] net_rx_action 0.58% bpf_prog_xdp_dispatcher [k] bpf_prog_17d608957d1f805a_xdp_dispatcher 0.57% [kernel] [k] bpf_dispatcher_xdp 0.53% [kernel] [k] mlx5e_post_rx_mpwqes 0.27% [kernel] [k] __do_softirq 0.25% [kernel] [k] mlx5e_poll_tx_cq -------------------------------------------------- perf top output (6.5) - XDP_TX -------------------------------------------------- 19.60% [kernel] [k] mlx5e_xdp_mpwqe_add_dseg 14.61% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear 11.55% [kernel] [k] mlx5e_xmit_xdp_buff 5.85% [kernel] [k] mlx5e_handle_rx_cqe_mpwrq 5.73% bpf_prog_xdp_swap_macs_prog [k] bpf_prog_0a3a_xdp_swap_macs_prog 5.09% [kernel] [k] mlx5e_free_xdpsq_desc 5.08% [kernel] [k] dma_sync_single_for_device 4.66% [kernel] [k] mlx5e_xmit_xdp_frame_mpwqe 3.64% [kernel] [k] mlx5e_rx_cq_process_basic_cqe_comp 3.34% [kernel] [k] page_pool_put_defragged_page 3.04% [kernel] [k] mlx5e_xdp_handle 3.03% [kernel] [k] mlx5e_page_release_fragmented.isra.0 2.56% [kernel] [k] dma_sync_single_for_cpu 2.15% [kernel] [k] mlx5e_alloc_rx_mpwqe 1.96% [kernel] [k] page_pool_alloc_pages 1.06% [kernel] [k] mlx5e_xmit_xdp_frame_check_mpwqe 1.02% [kernel] [k] bpf_dispatcher_xdp_func 1.01% [kernel] [k] mlx5e_free_rx_mpwqe 0.84% [kernel] [k] mlx5e_poll_xdpsq_cq 0.62% [kernel] [k] mlx5e_xdpsq_get_next_pi 0.53% [kernel] [k] mlx5e_poll_ico_cq 0.48% [kernel] [k] bpf_get_smp_processor_id 0.48% [kernel] [k] net_rx_action 0.36% [kernel] [k] mlx5e_napi_poll 0.32% [kernel] [k] mlx5e_xdp_mpwqe_complete 0.25% [kernel] [k] bpf_dispatcher_xdp 0.22% bpf_prog_xdp_dispatcher [k] bpf_prog_17d6_xdp_dispatcher 0.21% [kernel] [k] mlx5e_post_rx_mpwqes 0.11% [kernel] [k] __do_softirq