Re: [Intel-wired-lan] FW: [PATCH bpf-next 2/4] xsk: allow AF_XDP sockets to receive packets directly from a queue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/20/2019 10:12 AM, Björn Töpel wrote:
On Sun, 20 Oct 2019 at 12:15, Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes:

On Fri, Oct 18, 2019 at 05:45:26PM -0700, Samudrala, Sridhar wrote:
On 10/18/2019 5:14 PM, Alexei Starovoitov wrote:
On Fri, Oct 18, 2019 at 11:40:07AM -0700, Samudrala, Sridhar wrote:

Perf report for "AF_XDP default rxdrop" with patched kernel - mitigations ON
==========================================================================
Samples: 44K of event 'cycles', Event count (approx.): 38532389541
Overhead  Command          Shared Object              Symbol
    15.31%  ksoftirqd/28     [i40e]                     [k] i40e_clean_rx_irq_zc
    10.50%  ksoftirqd/28     bpf_prog_80b55d8a76303785  [k] bpf_prog_80b55d8a76303785
     9.48%  xdpsock          [i40e]                     [k] i40e_clean_rx_irq_zc
     8.62%  xdpsock          xdpsock                    [.] main
     7.11%  ksoftirqd/28     [kernel.vmlinux]           [k] xsk_rcv
     5.81%  ksoftirqd/28     [kernel.vmlinux]           [k] xdp_do_redirect
     4.46%  xdpsock          bpf_prog_80b55d8a76303785  [k] bpf_prog_80b55d8a76303785
     3.83%  xdpsock          [kernel.vmlinux]           [k] xsk_rcv

why everything is duplicated?
Same code runs in different tasks ?

Yes. looks like these functions run from both the app(xdpsock) context and ksoftirqd context.


     2.81%  ksoftirqd/28     [kernel.vmlinux]           [k] bpf_xdp_redirect_map
     2.78%  ksoftirqd/28     [kernel.vmlinux]           [k] xsk_map_lookup_elem
     2.44%  xdpsock          [kernel.vmlinux]           [k] xdp_do_redirect
     2.19%  ksoftirqd/28     [kernel.vmlinux]           [k] __xsk_map_redirect
     1.62%  ksoftirqd/28     [kernel.vmlinux]           [k] xsk_umem_peek_addr
     1.57%  xdpsock          [kernel.vmlinux]           [k] xsk_umem_peek_addr
     1.32%  ksoftirqd/28     [kernel.vmlinux]           [k] dma_direct_sync_single_for_cpu
     1.28%  xdpsock          [kernel.vmlinux]           [k] bpf_xdp_redirect_map
     1.15%  xdpsock          [kernel.vmlinux]           [k] dma_direct_sync_single_for_device
     1.12%  xdpsock          [kernel.vmlinux]           [k] xsk_map_lookup_elem
     1.06%  xdpsock          [kernel.vmlinux]           [k] __xsk_map_redirect
     0.94%  ksoftirqd/28     [kernel.vmlinux]           [k] dma_direct_sync_single_for_device
     0.75%  ksoftirqd/28     [kernel.vmlinux]           [k] __x86_indirect_thunk_rax
     0.66%  ksoftirqd/28     [i40e]                     [k] i40e_clean_programming_status
     0.64%  ksoftirqd/28     [kernel.vmlinux]           [k] net_rx_action
     0.64%  swapper          [kernel.vmlinux]           [k] intel_idle
     0.62%  ksoftirqd/28     [i40e]                     [k] i40e_napi_poll
     0.57%  xdpsock          [kernel.vmlinux]           [k] dma_direct_sync_single_for_cpu

Perf report for "AF_XDP direct rxdrop" with patched kernel - mitigations ON
==========================================================================
Samples: 46K of event 'cycles', Event count (approx.): 38387018585
Overhead  Command          Shared Object             Symbol
    21.94%  ksoftirqd/28     [i40e]                    [k] i40e_clean_rx_irq_zc
    14.36%  xdpsock          xdpsock                   [.] main
    11.53%  ksoftirqd/28     [kernel.vmlinux]          [k] xsk_rcv
    11.32%  xdpsock          [i40e]                    [k] i40e_clean_rx_irq_zc
     4.02%  xdpsock          [kernel.vmlinux]          [k] xsk_rcv
     2.91%  ksoftirqd/28     [kernel.vmlinux]          [k] xdp_do_redirect
     2.45%  ksoftirqd/28     [kernel.vmlinux]          [k] xsk_umem_peek_addr
     2.19%  xdpsock          [kernel.vmlinux]          [k] xsk_umem_peek_addr
     2.08%  ksoftirqd/28     [kernel.vmlinux]          [k] bpf_direct_xsk
     2.07%  ksoftirqd/28     [kernel.vmlinux]          [k] dma_direct_sync_single_for_cpu
     1.53%  ksoftirqd/28     [kernel.vmlinux]          [k] dma_direct_sync_single_for_device
     1.39%  xdpsock          [kernel.vmlinux]          [k] dma_direct_sync_single_for_device
     1.22%  ksoftirqd/28     [kernel.vmlinux]          [k] xdp_get_xsk_from_qid
     1.12%  ksoftirqd/28     [i40e]                    [k] i40e_clean_programming_status
     0.96%  ksoftirqd/28     [i40e]                    [k] i40e_napi_poll
     0.95%  ksoftirqd/28     [kernel.vmlinux]          [k] net_rx_action
     0.89%  xdpsock          [kernel.vmlinux]          [k] xdp_do_redirect
     0.83%  swapper          [i40e]                    [k] i40e_clean_rx_irq_zc
     0.70%  swapper          [kernel.vmlinux]          [k] intel_idle
     0.66%  xdpsock          [kernel.vmlinux]          [k] dma_direct_sync_single_for_cpu
     0.60%  xdpsock          [kernel.vmlinux]          [k] bpf_direct_xsk
     0.50%  ksoftirqd/28     [kernel.vmlinux]          [k] xsk_umem_discard_addr

Based on the perf reports comparing AF_XDP default and direct rxdrop, we can say that
AF_XDP direct rxdrop codepath is avoiding the overhead of going through these functions
  bpf_prog_xxx
          bpf_xdp_redirect_map
  xsk_map_lookup_elem
          __xsk_map_redirect
With AF_XDP direct, xsk_rcv() is directly called via bpf_direct_xsk() in xdp_do_redirect()

I don't think you're identifying the overhead correctly.
xsk_map_lookup_elem is 1%
but bpf_xdp_redirect_map() suppose to call __xsk_map_lookup_elem()
which is a different function:
ffffffff81493fe0 T __xsk_map_lookup_elem
ffffffff81492e80 t xsk_map_lookup_elem

10% for bpf_prog_80b55d8a76303785 is huge.
It's the actual code of the program _without_ any helpers.
How does the program actually look?

It is the xdp program that is loaded via xsk_load_xdp_prog() in tools/lib/bpf/xsk.c
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/lib/bpf/xsk.c#n268

I see. Looks like map_gen_lookup was never implemented for xskmap.
How about adding it first the way array_map_gen_lookup() is implemented?
This will easily give 2x perf gain.

I guess we should implement this for devmaps as well now that we allow
lookups into those.

However, in this particular example, the lookup from BPF is not actually
needed, since bpf_redirect_map() will return a configurable error value
when the map lookup fails (for exactly this use case).

So replacing:

if (bpf_map_lookup_elem(&xsks_map, &index))
     return bpf_redirect_map(&xsks_map, index, 0);

with simply

return bpf_redirect_map(&xsks_map, index, XDP_PASS);

would save the call to xsk_map_lookup_elem().


Thanks for the reminder! I just submitted a patch. Still, doing the
map_gen_lookup()  for xsk/devmaps still makes sense!


I tried Bjorn's patch that avoids the lookups in the BPF prog.
https://lore.kernel.org/netdev/20191021105938.11820-1-bjorn.topel@xxxxxxxxx/

With this patch I am also seeing around 3-4% increase in xdpsock rxdrop performance and
the perf report looks like this.

Samples: 44K of event 'cycles', Event count (approx.): 38749965204
Overhead  Command          Shared Object              Symbol
  16.06%  ksoftirqd/28     [i40e]                     [k] i40e_clean_rx_irq_zc
  10.18%  ksoftirqd/28     bpf_prog_3c8251c7e0fef8db  [k] bpf_prog_3c8251c7e0fef8db
  10.15%  xdpsock          [i40e]                     [k] i40e_clean_rx_irq_zc
  10.06%  ksoftirqd/28     [kernel.vmlinux]           [k] xsk_rcv
   7.45%  xdpsock          xdpsock                    [.] main
   5.76%  ksoftirqd/28     [kernel.vmlinux]           [k] xdp_do_redirect
   4.51%  xdpsock          bpf_prog_3c8251c7e0fef8db  [k] bpf_prog_3c8251c7e0fef8db
   3.67%  xdpsock          [kernel.vmlinux]           [k] xsk_rcv
   3.06%  ksoftirqd/28     [kernel.vmlinux]           [k] bpf_xdp_redirect_map
   2.34%  ksoftirqd/28     [kernel.vmlinux]           [k] __xsk_map_redirect
   2.33%  xdpsock          [kernel.vmlinux]           [k] xdp_do_redirect
   1.69%  ksoftirqd/28     [kernel.vmlinux]           [k] xsk_umem_peek_addr
   1.69%  xdpsock          [kernel.vmlinux]           [k] xsk_umem_peek_addr
   1.42%  ksoftirqd/28     [kernel.vmlinux]           [k] dma_direct_sync_single_for_cpu
   1.19%  xdpsock          [kernel.vmlinux]           [k] bpf_xdp_redirect_map
   1.13%  xdpsock          [kernel.vmlinux]           [k] dma_direct_sync_single_for_device
   0.95%  ksoftirqd/28     [kernel.vmlinux]           [k] dma_direct_sync_single_for_device
   0.92%  swapper          [kernel.vmlinux]           [k] intel_idle
   0.92%  xdpsock          [kernel.vmlinux]           [k] __xsk_map_redirect
   0.80%  ksoftirqd/28     [kernel.vmlinux]           [k] __x86_indirect_thunk_rax
   0.73%  ksoftirqd/28     [i40e]                     [k] i40e_clean_programming_status
   0.71%  ksoftirqd/28     [kernel.vmlinux]           [k] __xsk_map_lookup_elem
   0.63%  ksoftirqd/28     [kernel.vmlinux]           [k] net_rx_action
   0.62%  ksoftirqd/28     [i40e]                     [k] i40e_napi_poll
   0.58%  xdpsock          [kernel.vmlinux]           [k] dma_direct_sync_single_for_cpu

So with this patch applied, direct receive performance improvement comes down from 46% to 42%.
I think it is still substantial enough to provide an option to allow direct receive for
certain use cases. If it is OK, i can re-spin and submit the patches on top of the latest bpf-next

Thanks
Sridhar







[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux