[PATCH bpf-next 0/9] bpf: cpumap: enable GRO for XDP_PASS frames

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Recently, I've been looking through my old XDP hints tree[0] to check
whether some patches not directly related to hints can be sent
standalone. Roughly at the same time, Daniel appeared and asked[1] about
GRO for cpumap from that tree.

Currently, cpumap uses its own kthread which processes cpumap-redirected
frames by batches of 8, without any weighting (but with rescheduling
points). The resulting skbs get passed to the stack via
netif_receive_skb_list(), which means no GRO happens.
Even though we can't currently pass checksum status from the drivers,
in many cases GRO performs better than the listified Rx without the
aggregation, confirmed by tests.

In order to enable GRO in cpumap, we need to do the following:

* patches 1-3: allow creating CPU-pinned threaded NAPIs;
* patch 4: switch cpumap from a custom kthread to a CPU-pinned
  threaded NAPI;

Additional improvements:

* patch 5: optimize XDP_PASS in cpumap by using arrays instead of linked
  lists;
* patch 6-7: introduce and use function do get skbs from the NAPI percpu
  caches by bulks, not one at a time;
* patch 8-9: use that function in veth and remove the one that was
  superseded by it.

My trafficgen UDP GRO tests, small frame sizes:

                GRO off    GRO on
baseline        2.7        N/A       Mpps
thread GRO      2.3        4         Mpps
thr bulk GRO    2.4        4.7       Mpps

1...2 diff      -17        +48       %
1...3 diff      -14        +75       %

Daniel reported +14% of throughput in neper's TCP RR tests[2].

[0] https://github.com/alobakin/linux/tree/xdp_hints
[1] https://lore.kernel.org/bpf/cadda351-6e93-4568-ba26-21a760bf9a57@xxxxxxxxxxxxxxxx
[2] https://lore.kernel.org/bpf/merfatcdvwpx2lj4j2pahhwp4vihstpidws3jwljwazhh76xkd@t5vsh4gvk4mh

Alexander Lobakin (7):
  firmware/psci: fix missing '%u' format literal in
    kthread_create_on_cpu()
  kthread: allow vararg kthread_{create,run}_on_cpu()
  bpf: cpumap: reuse skb array instead of a linked list to chain skbs
  net: skbuff: introduce napi_skb_cache_get_bulk()
  bpf: cpumap: switch to napi_skb_cache_get_bulk()
  veth: use napi_skb_cache_get_bulk() instead of xdp_alloc_skb_bulk()
  xdp: remove xdp_alloc_skb_bulk()

Lorenzo Bianconi (2):
  net: napi: add ability to create CPU-pinned threaded NAPI
  bpf: cpumap: use CPU-pinned threaded NAPI w/GRO instead of kthread

 include/linux/kthread.h              |  51 ++++---
 include/linux/netdevice.h            |  35 ++++-
 include/linux/skbuff.h               |   1 +
 include/net/xdp.h                    |   1 -
 drivers/firmware/psci/psci_checker.c |   2 +-
 drivers/net/veth.c                   |   3 +-
 kernel/bpf/cpumap.c                  | 210 ++++++++++++---------------
 kernel/kthread.c                     |  22 +--
 net/core/dev.c                       |  18 ++-
 net/core/skbuff.c                    |  62 ++++++++
 net/core/xdp.c                       |  10 --
 11 files changed, 251 insertions(+), 164 deletions(-)

-- 
2.46.0





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux