This series adds support for refcounted local kptrs to the verifier. A local kptr is 'refcounted' if its type contains a struct bpf_refcount field: struct refcounted_node { long data; struct bpf_list_node ll; struct bpf_refcount ref; }; bpf_refcount is used to implement shared ownership for local kptrs. Motivating usecase ================== If a struct has two collection node fields, e.g.: struct node { long key; long val; struct bpf_rb_node rb; struct bpf_list_node ll; }; It's not currently possible to add a node to both the list and rbtree: long bpf_prog(void *ctx) { struct node *n = bpf_obj_new(typeof(*n)); if (!n) { /* ... */ } bpf_spin_lock(&lock); bpf_list_push_back(&head, &n->ll); bpf_rbtree_add(&root, &n->rb, less); /* Assume a resonable less() */ bpf_spin_unlock(&lock); } The above program will fail verification due to current owning / non-owning ref logic: after bpf_list_push_back, n is a non-owning reference and thus cannot be passed to bpf_rbtree_add. The only way to get an owning reference for the node that was added is to bpf_list_pop_{front,back} it. More generally, verifier ownership semantics expect that a node has one owner (program, collection, or stashed in map) with exclusive ownership of the node's lifetime. The owner free's the node's underlying memory when it itself goes away. Without a shared ownership concept it's impossible to express many real-world usecases such that they pass verification. Semantic Changes ================ Before this series, the verifier could make this statement: "whoever has the owning reference has exclusive ownership of the referent's lifetime". As demonstrated in the previous section, this implies that a BPF program can't have an owning reference to some node if that node is in a collection. If such a state were possible, the node would have multiple owners, each thinking they have exclusive ownership. In order to support shared ownership it's necessary to modify the exclusive ownership semantic. After this series' changes, an owning reference has ownership of the referent's lifetime, but it's not necessarily exclusive. The referent's underlying memory is guaranteed to be valid (i.e. not free'd) until the reference is dropped or used for collection insert. This change doesn't affect UX of owning or non-owning references much: * insert kfuncs (bpf_rbtree_add, bpf_list_push_{front,back}) still require an owning reference arg, as ownership still must be passed to the collection in a shared-ownership world. * non-owning references still refer to valid memory without claiming any ownership. One important conclusion that followed from "exclusive ownership" statement is no longer valid, though. In exclusive-ownership world, if a BPF prog has an owning reference to a node, the verifier can conclude that no collection has ownership of it. This conclusion was used to avoid runtime checking in the implementations of insert and remove operations (""has the node already been {inserted, removed}?"). In a shared-ownership world the aforementioned conclusion is no longer valid, which necessitates doing runtime checking in insert and remove operation kfuncs, and those functions possibly failing to insert or remove anything. Luckily the verifier changes necessary to go from exclusive to shared ownership were fairly minimal. Patches in this series which do change verifier semantics generally have some summary dedicated to explaining why certain usecases Just Work for shared ownership without verifier changes. Implementation ============== The changes in this series can be categorized as follows: * struct bpf_refcount opaque field + plumbing * support for refcounted kptrs in bpf_obj_new and bpf_obj_drop * bpf_refcount_acquire kfunc * enables shared ownershp by bumping refcount + acquiring owning ref * support for possibly-failing collection insertion and removal * insertion changes are more complex If a patch's changes have some nuance to their effect - or lack of effect - on verifier behavior, the patch summary talks about it at length. Patch contents: * Patch 1 removes btf_field_offs struct * Patch 2 adds struct bpf_refcount and associated plumbing * Patch 3 modifies semantics of bpf_obj_drop and bpf_obj_new to handle refcounted kptrs * Patch 4 adds bpf_refcount_acquire * Patches 5-7 add support for possibly-failing collection insert and remove * Patch 8 centralizes constructor-like functionality for local kptr types * Patch 9 adds tests for new functionality base-commit: c4d3b488a90be95f4f9413dc7eae5fc113d15fe9 Dave Marchevsky (9): bpf: Remove btf_field_offs, use btf_record's fields instead bpf: Introduce opaque bpf_refcount struct and add btf_record plumbing bpf: Support refcounted local kptrs in existing semantics bpf: Add bpf_refcount_acquire kfunc bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail selftests/bpf: Modify linked_list tests to work with macro-ified inserts bpf: Migrate bpf_rbtree_remove to possibly fail bpf: Centralize btf_field-specific initialization logic selftests/bpf: Add refcounted_kptr tests include/linux/bpf.h | 85 ++-- include/linux/bpf_verifier.h | 7 +- include/linux/btf.h | 2 - include/uapi/linux/bpf.h | 4 + kernel/bpf/btf.c | 126 ++---- kernel/bpf/helpers.c | 116 +++-- kernel/bpf/map_in_map.c | 15 - kernel/bpf/syscall.c | 23 +- kernel/bpf/verifier.c | 153 +++++-- tools/include/uapi/linux/bpf.h | 4 + .../testing/selftests/bpf/bpf_experimental.h | 60 ++- .../selftests/bpf/prog_tests/linked_list.c | 8 +- .../testing/selftests/bpf/prog_tests/rbtree.c | 25 ++ .../bpf/prog_tests/refcounted_kptr.c | 18 + .../testing/selftests/bpf/progs/linked_list.c | 34 +- .../testing/selftests/bpf/progs/linked_list.h | 4 +- .../selftests/bpf/progs/linked_list_fail.c | 96 ++-- tools/testing/selftests/bpf/progs/rbtree.c | 74 +++- .../testing/selftests/bpf/progs/rbtree_fail.c | 77 ++-- .../selftests/bpf/progs/refcounted_kptr.c | 410 ++++++++++++++++++ .../bpf/progs/refcounted_kptr_fail.c | 72 +++ 21 files changed, 1066 insertions(+), 347 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c create mode 100644 tools/testing/selftests/bpf/progs/refcounted_kptr.c create mode 100644 tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c -- 2.34.1