Re: [PATCH RFC net-next 1/2] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.

Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> · Tue, 20 Feb 2024 16:32:06 +0100

On 2024-02-20 13:57:24 [+0100], Jesper Dangaard Brouer wrote:
> > so I replaced nr_cpu_ids with 64 and bootet maxcpus=64 so that I can run
> > xdp-bench on the ixgbe.
> > 
> 
> Yes, ixgbe HW have limited TX queues, and XDP tries to allocate a
> hardware TX queue for every CPU in the system.  So, I guess you have too
> many CPUs in your system - lol.
> 
> Other drivers have a fallback to a locked XDP TX path, so this is also
> something to lookout for in the machine with i40e.

this locked XDP TX path starts at 64 but xdp_progs are rejected > 64 * 2.

> > so. i40 send, ixgbe receive.
> > 
> > -t 2
> > 
> > | Summary                 2,348,800 rx/s                  0 err/s
> > |   receive total         2,348,800 pkt/s         2,348,800 drop/s                0 error/s
> > |     cpu:0               2,348,800 pkt/s         2,348,800 drop/s                0 error/s
> > |   xdp_exception                 0 hit/s
> > 
> 
> This is way too low, with i40e sending.
> 
> On my system with only -t 1 my i40e driver can send with approx 15Mpps:
> 
>  Ethtool(i40e2) stat:     15028585 (  15,028,585) <= tx-0.packets /sec
>  Ethtool(i40e2) stat:     15028589 (  15,028,589) <= tx_packets /sec

-t1 in ixgbe
Show adapter(s) (eth1) statistics (ONLY that changed!)
Ethtool(eth1    ) stat:    107857263 (    107,857,263) <= tx_bytes /sec
Ethtool(eth1    ) stat:    115047684 (    115,047,684) <= tx_bytes_nic /sec
Ethtool(eth1    ) stat:      1797621 (      1,797,621) <= tx_packets /sec
Ethtool(eth1    ) stat:      1797636 (      1,797,636) <= tx_pkts_nic /sec
Ethtool(eth1    ) stat:    107857263 (    107,857,263) <= tx_queue_0_bytes /sec
Ethtool(eth1    ) stat:      1797621 (      1,797,621) <= tx_queue_0_packets /sec

-t i40e
Ethtool(eno2np1 ) stat:           90 (             90) <= port.rx_bytes /sec
Ethtool(eno2np1 ) stat:            1 (              1) <= port.rx_size_127 /sec
Ethtool(eno2np1 ) stat:            1 (              1) <= port.rx_unicast /sec
Ethtool(eno2np1 ) stat:     79554379 (     79,554,379) <= port.tx_bytes /sec
Ethtool(eno2np1 ) stat:      1243037 (      1,243,037) <= port.tx_size_64 /sec
Ethtool(eno2np1 ) stat:      1243037 (      1,243,037) <= port.tx_unicast /sec
Ethtool(eno2np1 ) stat:           86 (             86) <= rx-32.bytes /sec
Ethtool(eno2np1 ) stat:            1 (              1) <= rx-32.packets /sec
Ethtool(eno2np1 ) stat:           86 (             86) <= rx_bytes /sec
Ethtool(eno2np1 ) stat:            1 (              1) <= rx_cache_waive /sec
Ethtool(eno2np1 ) stat:            1 (              1) <= rx_packets /sec
Ethtool(eno2np1 ) stat:            1 (              1) <= rx_unicast /sec
Ethtool(eno2np1 ) stat:     74580821 (     74,580,821) <= tx-0.bytes /sec
Ethtool(eno2np1 ) stat:      1243014 (      1,243,014) <= tx-0.packets /sec
Ethtool(eno2np1 ) stat:     74580821 (     74,580,821) <= tx_bytes /sec
Ethtool(eno2np1 ) stat:      1243014 (      1,243,014) <= tx_packets /sec
Ethtool(eno2np1 ) stat:      1243037 (      1,243,037) <= tx_unicast /sec

mine is slightly slower. But this seems to match what I see on the RX
side.

> At this level, if you can verify that CPU:60 is 100% loaded, and packet
> generator is sending more than rx number, then it could work as a valid
> experiment.

i40e receiving on 8:
%Cpu8  :  0.0 us,  0.0 sy,  0.0 ni, 84.8 id,  0.0 wa,  0.0 hi, 15.2 si,  0.0 st 

ixgbe receiving on 13:
%Cpu13 :  0.0 us,  0.0 sy,  0.0 ni, 56.7 id,  0.0 wa,  0.0 hi, 43.3 si,  0.0 st 

looks idle. On the sending side kpktgend_0 is always at 100%.

> > -t 18
> > | Summary                 7,784,946 rx/s                  0 err/s
> > |   receive total         7,784,946 pkt/s         7,784,946 drop/s                0 error/s
> > |     cpu:60              7,784,946 pkt/s         7,784,946 drop/s                0 error/s
> > |   xdp_exception                 0 hit/s
> > 
> > after t18 it drop down to 2,…
> > Now I got worse than before since -t8 says 7,5… and it did 8,4 in the
> > morning. Do you have maybe a .config for me in case I did not enable the
> > performance switch?
> > 
> 
> I would look for root-cause with perf record +
>  perf report --sort cpu,comm,dso,symbol --no-children

while sending with ixgbe while running perf top on the box:
| Samples: 621K of event 'cycles', 4000 Hz, Event count (approx.): 49979376685 lost: 0/0 drop: 0/0
| Overhead  CPU  Command          Shared Object             Symbol
|   31.98%  000  kpktgend_0       [kernel]                  [k] xas_find
|    6.72%  000  kpktgend_0       [kernel]                  [k] pfn_to_dma_pte
|    5.63%  000  kpktgend_0       [kernel]                  [k] ixgbe_xmit_frame_ring
|    4.78%  000  kpktgend_0       [kernel]                  [k] dma_pte_clear_level
|    3.16%  000  kpktgend_0       [kernel]                  [k] __iommu_dma_unmap
|    2.30%  000  kpktgend_0       [kernel]                  [k] fq_ring_free_locked
|    1.99%  000  kpktgend_0       [kernel]                  [k] __domain_mapping
|    1.82%  000  kpktgend_0       [kernel]                  [k] iommu_dma_alloc_iova
|    1.80%  000  kpktgend_0       [kernel]                  [k] __iommu_map
|    1.72%  000  kpktgend_0       [kernel]                  [k] iommu_pgsize.isra.0
|    1.70%  000  kpktgend_0       [kernel]                  [k] __iommu_dma_map
|    1.63%  000  kpktgend_0       [kernel]                  [k] alloc_iova_fast
|    1.59%  000  kpktgend_0       [kernel]                  [k] _raw_spin_lock_irqsave
|    1.32%  000  kpktgend_0       [kernel]                  [k] iommu_map
|    1.30%  000  kpktgend_0       [kernel]                  [k] iommu_dma_map_page
|    1.23%  000  kpktgend_0       [kernel]                  [k] intel_iommu_iotlb_sync_map
|    1.21%  000  kpktgend_0       [kernel]                  [k] xa_find_after
|    1.17%  000  kpktgend_0       [kernel]                  [k] ixgbe_poll
|    1.06%  000  kpktgend_0       [kernel]                  [k] __iommu_unmap
|    1.04%  000  kpktgend_0       [kernel]                  [k] intel_iommu_unmap_pages
|    1.01%  000  kpktgend_0       [kernel]                  [k] free_iova_fast
|    0.96%  000  kpktgend_0       [pktgen]                  [k] pktgen_thread_worker

the i40e box while sending:
|Samples: 400K of event 'cycles:P', 4000 Hz, Event count (approx.): 80512443924 lost: 0/0 drop: 0/0
|Overhead  CPU  Command          Shared Object         Symbol
|  24.04%  000  kpktgend_0       [kernel]              [k] i40e_lan_xmit_frame
|  17.20%  019  swapper          [kernel]              [k] i40e_napi_poll
|   4.84%  019  swapper          [kernel]              [k] intel_idle_irq
|   4.20%  019  swapper          [kernel]              [k] napi_consume_skb
|   3.00%  000  kpktgend_0       [pktgen]              [k] pktgen_thread_worker
|   2.76%  008  swapper          [kernel]              [k] i40e_napi_poll
|   2.36%  000  kpktgend_0       [kernel]              [k] dma_map_page_attrs
|   1.93%  019  swapper          [kernel]              [k] dma_unmap_page_attrs
|   1.70%  008  swapper          [kernel]              [k] intel_idle_irq
|   1.44%  008  swapper          [kernel]              [k] __udp4_lib_rcv
|   1.44%  008  swapper          [kernel]              [k] __netif_receive_skb_core.constprop.0
|   1.40%  008  swapper          [kernel]              [k] napi_build_skb
|   1.28%  000  kpktgend_0       [kernel]              [k] kfree_skb_reason
|   1.27%  008  swapper          [kernel]              [k] ip_rcv_core
|   1.19%  008  swapper          [kernel]              [k] inet_gro_receive
|   1.01%  008  swapper          [kernel]              [k] kmem_cache_free.part.0

> --Jesper

Sebastian