This series adds a rbtree datastructure following the "next-gen datastructure" precedent set by recently-added linked-list [0]. This is a reimplementation of previous rbtree RFC [1] to use kfunc + kptr instead of adding a new map type. This series adds a smaller set of API functions than that RFC - just the minimum needed to support current cgfifo example scheduler in ongoing sched_ext effort [2], namely: bpf_rbtree_add bpf_rbtree_remove bpf_rbtree_first The meat of this series is bugfixes and verifier infra work to support these API functions. Adding more rbtree kfuncs in future patches should be straightforward as a result. BPF rbtree uses struct rb_root_cached + existing rbtree lib under the hood. From the BPF program writer's perspective, a BPF rbtree is very similar to existing linked list. Consider the following example: struct node_data { long key; long data; struct bpf_rb_node node; } static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b) { struct node_data *node_a; struct node_data *node_b; node_a = container_of(a, struct node_data, node); node_b = container_of(b, struct node_data, node); return node_a->key < node_b->key; } private(A) struct bpf_spin_lock glock; private(A) struct bpf_rb_root groot __contains(node_data, node); /* ... in BPF program */ struct node_data *n, *m; struct bpf_rb_node *res; n = bpf_obj_new(typeof(*n)); if (!n) /* skip */ n->key = 5; n->data = 10; bpf_spin_lock(&glock); bpf_rbtree_add(&groot, &n->node, less); bpf_spin_unlock(&glock); bpf_spin_lock(&glock); res = bpf_rbtree_first(&groot); if (!res) /* skip */ res = bpf_rbtree_remove(&groot, res); if (!res) /* skip */ bpf_spin_unlock(&glock); m = container_of(res, struct node_data, node); bpf_obj_drop(m); Some obvious similarities: * Special bpf_rb_root and bpf_rb_node types have same semantics as bpf_list_head and bpf_list_node, respectively * __contains is used to associated node type with root * The spin_lock associated with a rbtree must be held when using rbtree API kfuncs * Nodes are allocated via bpf_obj_new and dropped via bpf_obj_drop * Rbtree takes ownership of node lifetime when a node is added. Removing a node gives ownership back to the program, requiring a bpf_obj_drop before program exit Some new additions as well: * Support for callbacks in kfunc args is added to enable 'less' callback use above * bpf_rbtree_first's release_on_unlock handling is a bit novel, as it's the first next-gen ds API function to release_on_unlock its return reg instead of nonexistent node arg * Because all references to nodes already added to the rbtree are 'non-owning', i.e. release_on_unlock and PTR_UNTRUSTED, bpf_rbtree_remove must accept such a reference in order to remove it from the tree It seemed better to special-case some 'new additions' verifier logic for now instead of adding new type flags and concepts, as some of the concepts (e.g. PTR_UNTRUSTED + release_on_unlock) need a refactoring pass before we pile more on. Regardless, the net-new verifier logic added in this patchset is minimal. Verifier changes are mostly generaliztion of existing linked-list logic and some bugfixes. A note on naming: Some existing list-specific helpers are renamed to 'datastructure_head', 'datastructure_node', etc. Probably a more concise and accurate naming would be something like 'ng_ds_head' for 'next-gen datastructure'. For folks who weren't following the conversations over past few months, though, such a naming scheme might seem to indicate that _all_ next-gen datastructures must have certain semantics, like release_on_unlock, which aren't necessarily required. For this reason I'd like some feedback on how to name things. Summary of patches: Patches 1, 2, and 10 are bugfixes which are likely worth applying independently of rbtree implementation. Patch 12 is somewhere between nice-to-have and bugfix. Patches 3 and 4 are nonfunctional refactor/rename. Patches 5 - 9 implement the meat of rbtree support in this series, gradually building up to implemented kfuncs that verify as expected. Patch 11 adds the bpf_rbtree_{add,first,remove} to bpf_experimental.h. Patch 13 adds tests. [0]: lore.kernel.org/bpf/20221118015614.2013203-1-memxor@xxxxxxxxx [1]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@xxxxxx [2]: lore.kernel.org/bpf/20221130082313.3241517-1-tj@xxxxxxxxxx Future work: Enabling writes to release_on_unlock refs should be done before the functionality of BPF rbtree can truly be considered complete. Implementing this proved more complex than expected so it's been pushed off to a future patch. Dave Marchevsky (13): bpf: Loosen alloc obj test in verifier's reg_btf_record bpf: map_check_btf should fail if btf_parse_fields fails bpf: Minor refactor of ref_set_release_on_unlock bpf: rename list_head -> datastructure_head in field info types bpf: Add basic bpf_rb_{root,node} support bpf: Add bpf_rbtree_{add,remove,first} kfuncs bpf: Add support for bpf_rb_root and bpf_rb_node in kfunc args bpf: Add callback validation to kfunc verifier logic bpf: Special verifier handling for bpf_rbtree_{remove, first} bpf, x86: BPF_PROBE_MEM handling for insn->off < 0 bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h libbpf: Make BTF mandatory if program BTF has spin_lock or alloc_obj type selftests/bpf: Add rbtree selftests arch/x86/net/bpf_jit_comp.c | 123 +++-- include/linux/bpf.h | 21 +- include/uapi/linux/bpf.h | 11 + kernel/bpf/btf.c | 181 ++++--- kernel/bpf/helpers.c | 75 ++- kernel/bpf/syscall.c | 33 +- kernel/bpf/verifier.c | 506 +++++++++++++++--- tools/include/uapi/linux/bpf.h | 11 + tools/lib/bpf/libbpf.c | 50 +- .../testing/selftests/bpf/bpf_experimental.h | 24 + .../selftests/bpf/prog_tests/linked_list.c | 12 +- .../testing/selftests/bpf/prog_tests/rbtree.c | 184 +++++++ tools/testing/selftests/bpf/progs/rbtree.c | 180 +++++++ .../progs/rbtree_btf_fail__add_wrong_type.c | 48 ++ .../progs/rbtree_btf_fail__wrong_node_type.c | 21 + .../testing/selftests/bpf/progs/rbtree_fail.c | 263 +++++++++ 16 files changed, 1549 insertions(+), 194 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/rbtree.c create mode 100644 tools/testing/selftests/bpf/progs/rbtree.c create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__add_wrong_type.c create mode 100644 tools/testing/selftests/bpf/progs/rbtree_btf_fail__wrong_node_type.c create mode 100644 tools/testing/selftests/bpf/progs/rbtree_fail.c -- 2.30.2