[PATCH bpf-next 00/13] BPF rbtree next-gen datastructure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This series adds a rbtree datastructure following the "next-gen
datastructure" precedent set by recently-added linked-list [0]. This is
a reimplementation of previous rbtree RFC [1] to use kfunc + kptr
instead of adding a new map type. This series adds a smaller set of API
functions than that RFC - just the minimum needed to support current
cgfifo example scheduler in ongoing sched_ext effort [2], namely:

  bpf_rbtree_add
  bpf_rbtree_remove
  bpf_rbtree_first

The meat of this series is bugfixes and verifier infra work to support
these API functions. Adding more rbtree kfuncs in future patches should
be straightforward as a result.

BPF rbtree uses struct rb_root_cached + existing rbtree lib under the
hood. From the BPF program writer's perspective, a BPF rbtree is very
similar to existing linked list. Consider the following example:

  struct node_data {
    long key;
    long data;
    struct bpf_rb_node node;
  }

  static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
  {
    struct node_data *node_a;
    struct node_data *node_b;

    node_a = container_of(a, struct node_data, node);
    node_b = container_of(b, struct node_data, node);

    return node_a->key < node_b->key;
  }

  private(A) struct bpf_spin_lock glock;
  private(A) struct bpf_rb_root groot __contains(node_data, node);

  /* ... in BPF program */
  struct node_data *n, *m;
  struct bpf_rb_node *res;

  n = bpf_obj_new(typeof(*n));
  if (!n)
    /* skip */
  n->key = 5;
  n->data = 10;

  bpf_spin_lock(&glock);
  bpf_rbtree_add(&groot, &n->node, less);
  bpf_spin_unlock(&glock);

  bpf_spin_lock(&glock);
  res = bpf_rbtree_first(&groot);
  if (!res)
    /* skip */
  res = bpf_rbtree_remove(&groot, res);
  if (!res)
    /* skip */
  bpf_spin_unlock(&glock);

  m = container_of(res, struct node_data, node);
  bpf_obj_drop(m);

Some obvious similarities:

  * Special bpf_rb_root and bpf_rb_node types have same semantics
    as bpf_list_head and bpf_list_node, respectively
  * __contains is used to associated node type with root
  * The spin_lock associated with a rbtree must be held when using
    rbtree API kfuncs
  * Nodes are allocated via bpf_obj_new and dropped via bpf_obj_drop
  * Rbtree takes ownership of node lifetime when a node is added.
    Removing a node gives ownership back to the program, requiring a
    bpf_obj_drop before program exit

Some new additions as well:

  * Support for callbacks in kfunc args is added to enable 'less'
    callback use above
  * bpf_rbtree_first's release_on_unlock handling is a bit novel, as
    it's the first next-gen ds API function to release_on_unlock its
    return reg instead of nonexistent node arg
  * Because all references to nodes already added to the rbtree are
    'non-owning', i.e. release_on_unlock and PTR_UNTRUSTED,
    bpf_rbtree_remove must accept such a reference in order to remove it
    from the tree

It seemed better to special-case some 'new additions' verifier logic for
now instead of adding new type flags and concepts, as some of the concepts
(e.g. PTR_UNTRUSTED + release_on_unlock) need a refactoring pass before
we pile more on. Regardless, the net-new verifier logic added in this
patchset is minimal. Verifier changes are mostly generaliztion of
existing linked-list logic and some bugfixes.

A note on naming: 

Some existing list-specific helpers are renamed to 'datastructure_head',
'datastructure_node', etc. Probably a more concise and accurate naming
would be something like 'ng_ds_head' for 'next-gen datastructure'.

For folks who weren't following the conversations over past few months, 
though, such a naming scheme might seem to indicate that _all_ next-gen
datastructures must have certain semantics, like release_on_unlock,
which aren't necessarily required. For this reason I'd like some
feedback on how to name things.

Summary of patches:

  Patches 1, 2, and 10 are bugfixes which are likely worth applying
  independently of rbtree implementation. Patch 12 is somewhere between
  nice-to-have and bugfix.

  Patches 3 and 4 are nonfunctional refactor/rename.

  Patches 5 - 9 implement the meat of rbtree support in this series,
  gradually building up to implemented kfuncs that verify as expected.
  Patch 11 adds the bpf_rbtree_{add,first,remove} to bpf_experimental.h.

  Patch 13 adds tests.

  [0]: lore.kernel.org/bpf/20221118015614.2013203-1-memxor@xxxxxxxxx
  [1]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@xxxxxx
  [2]: lore.kernel.org/bpf/20221130082313.3241517-1-tj@xxxxxxxxxx

Future work:
  Enabling writes to release_on_unlock refs should be done before the
  functionality of BPF rbtree can truly be considered complete.
  Implementing this proved more complex than expected so it's been
  pushed off to a future patch.

Dave Marchevsky (13):
  bpf: Loosen alloc obj test in verifier's reg_btf_record
  bpf: map_check_btf should fail if btf_parse_fields fails
  bpf: Minor refactor of ref_set_release_on_unlock
  bpf: rename list_head -> datastructure_head in field info types
  bpf: Add basic bpf_rb_{root,node} support
  bpf: Add bpf_rbtree_{add,remove,first} kfuncs
  bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args
  bpf: Add callback validation to kfunc verifier logic
  bpf: Special verifier handling for bpf_rbtree_{remove, first}
  bpf, x86: BPF_PROBE_MEM handling for insn->off < 0
  bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h
  libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj
    type
  selftests/bpf: Add rbtree selftests

 arch/x86/net/bpf_jit_comp.c                   | 123 +++--
 include/linux/bpf.h                           |  21 +-
 include/uapi/linux/bpf.h                      |  11 +
 kernel/bpf/btf.c                              | 181 ++++---
 kernel/bpf/helpers.c                          |  75 ++-
 kernel/bpf/syscall.c                          |  33 +-
 kernel/bpf/verifier.c                         | 506 +++++++++++++++---
 tools/include/uapi/linux/bpf.h                |  11 +
 tools/lib/bpf/libbpf.c                        |  50 +-
 .../testing/selftests/bpf/bpf_experimental.h  |  24 +
 .../selftests/bpf/prog_tests/linked_list.c    |  12 +-
 .../testing/selftests/bpf/prog_tests/rbtree.c | 184 +++++++
 tools/testing/selftests/bpf/progs/rbtree.c    | 180 +++++++
 .../progs/rbtree_btf_fail__add_wrong_type.c   |  48 ++
 .../progs/rbtree_btf_fail__wrong_node_type.c  |  21 +
 .../testing/selftests/bpf/progs/rbtree_fail.c | 263 +++++++++
 16 files changed, 1549 insertions(+), 194 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/rbtree.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__add_wrong_type.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__wrong_node_type.c
 create mode 100644 tools/testing/selftests/bpf/progs/rbtree_fail.c

-- 
2.30.2





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux