[PATCH RFC bpf-next 00/20] traits: Per packet metadata KV store

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Currently, the only way to attach information to a sk_buff that travels 
through the network stack is by using the mark field. This 32-bit field
is highly versatile - it can be read in firewall rules, drive routing 
decisions, and be accessed by BPF programs.

However, its limited capacity creates competition for bits, restricting 
its practical use.

To remedy this, we propose using part of the packet headroom to store 
metadata. This would allow:
- Tracing packets through the network stack and across the kernel-user
  space boundary, by assigning them a unique ID.
- Metadata-driven packet redirection, routing, and socket steering with
  early classification in XDP.
- Extracting information from encapsulation headers and sharing it with
  user space or vice versa.
- Exposing XDP RX Metadata, like the timestamp, to the rest of the 
  network stack.

We originally proposed extending XDP metadata - binary blob
storage also in the headroom - to expose it throughout the network 
stack. However based on feedback at LPC 2024 [1]:
- sharing a binary blob amongst different applications is hard.
- exposing a binary blob to userspace is awkward.
we've shifted to a limited KV store in the headroom.

To differentiate this from the overloaded "metadata" term, it's 
tentatively called "packet traits".

A get() / set() / delete() API is exposed to BPF to store and 
retrieve traits. 

Initial benchmarks in XDP are promising, with get() / set() comparable
to an indirect function call. See patch 6: "trait: Replace memmove calls
with inline move" for full results.

We imagine adding first class support for this in netfilter (setting 
/ checking traits in rules) and routing (selecting routing tables 
based on traits) in follow up work.
We also envisage a first class userspace API for storing and
retrieving traits in the future.

To co-exist with the existing XDP metadata area, traits are stored at
the start of the headroom:

| xdp_frame | traits | headroom | XDP metadata | data / packet |

Traits and XDP metadata are not allowed to overlap.

Like XDP metadata, this relies on there being sufficient headroom
available. Piggy backing on top of that work, traits are currently
only supported:
- On ingress.
- By NIC drivers that support XDP metadata.
- When an XDP program is attached.
This limits the applicability of traits. But future work 
guaranteeing sufficient headroom through other means should allow
these restrictions to be lifted.

There are still a number of open questions:
- What sizes of values should be allowed? See patch 1 "trait: limited KV
  store for packet metadata".
- How should we handle skb clones? See patch 16 "trait: Support sk_buffs".
- How should trait keys be allocated? See patch 18 "trait: registration
  API".
- How should traits work with GRO? Could an API let us specify policies 
  for how traits should be merged? See patch 18 "trait: registration
  API".

[1] https://lpc.events/event/18/contributions/1935/

Cc: jakub@xxxxxxxxxxxxxx
Cc: hawk@xxxxxxxxxx
Cc: yan@xxxxxxxxxxxxxx
Cc: jbrandeburg@xxxxxxxxxxxxxx
Cc: thoiland@xxxxxxxxxx
Cc: lbiancon@xxxxxxxxxx

To: netdev@xxxxxxxxxxxxxxx
To: bpf@xxxxxxxxxxxxxxx

Signed-off-by: Arthur Fabre <afabre@xxxxxxxxxxxxxx>
---
Arthur Fabre (19):
      trait: limited KV store for packet metadata
      trait: XDP support
      trait: basic XDP selftest
      trait: basic XDP benchmark
      trait: Replace memcpy calls with inline copies
      trait: Replace memmove calls with inline move
      xdp: Track if metadata is supported in xdp_frame <> xdp_buff conversions
      trait: Propagate presence of traits to sk_buff
      bnxt: Propagate trait presence to skb
      ice: Propagate trait presence to skb
      veth: Propagate trait presence to skb
      virtio_net: Propagate trait presence to skb
      mlx5: Propagate trait presence to skb
      xdp generic: Propagate trait presence to skb
      trait: Support sk_buffs
      trait: Allow socket filters to access traits
      trait: registration API
      trait: Sync linux/bpf.h to tools/ for trait registration
      trait: register traits in benchmarks and tests

Jesper Dangaard Brouer (1):
      mlx5: move xdp_buff scope one level up

 drivers/net/ethernet/broadcom/bnxt/bnxt.c          |   4 +
 drivers/net/ethernet/intel/ice/ice_txrx.c          |   4 +
 drivers/net/ethernet/intel/ice/ice_xsk.c           |   2 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |   6 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/rx.c    |   6 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/rx.h    |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    | 114 ++++----
 drivers/net/veth.c                                 |   4 +
 drivers/net/virtio_net.c                           |   8 +-
 include/linux/bpf-netns.h                          |  12 +
 include/linux/skbuff.h                             |  33 ++-
 include/net/net_namespace.h                        |   6 +
 include/net/netns/trait.h                          |  22 ++
 include/net/trait.h                                | 288 +++++++++++++++++++++
 include/net/xdp.h                                  |  42 ++-
 include/uapi/linux/bpf.h                           |  26 ++
 kernel/bpf/net_namespace.c                         |  54 ++++
 kernel/bpf/syscall.c                               |  26 ++
 kernel/bpf/verifier.c                              |  39 ++-
 net/core/dev.c                                     |   1 +
 net/core/filter.c                                  |  43 ++-
 net/core/skbuff.c                                  |  25 +-
 net/core/xdp.c                                     |  50 ++++
 tools/include/uapi/linux/bpf.h                     |  26 ++
 tools/testing/selftests/bpf/Makefile               |   2 +
 tools/testing/selftests/bpf/bench.c                |  11 +
 tools/testing/selftests/bpf/bench.h                |   1 +
 .../selftests/bpf/benchs/bench_xdp_traits.c        | 191 ++++++++++++++
 .../testing/selftests/bpf/prog_tests/xdp_traits.c  |  51 ++++
 .../testing/selftests/bpf/progs/bench_xdp_traits.c | 131 ++++++++++
 .../testing/selftests/bpf/progs/test_xdp_traits.c  |  94 +++++++
 31 files changed, 1259 insertions(+), 69 deletions(-)
---
base-commit: 42ba8a49d085e0c2ad50fb9a8ec954c9762b6e01
change-id: 20250305-afabre-traits-010-rfc2-a8e4de0c490b

Best regards,
-- 
Arthur Fabre <afabre@xxxxxxxxxxxxxx>





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux