On Sat, Apr 15, 2023 at 01:18:07PM -0700, Dave Marchevsky wrote: > -extern void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node, > - bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b)) __ksym; > +extern int bpf_rbtree_add_impl(struct bpf_rb_root *root, struct bpf_rb_node *node, > + bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b), > + void *meta, __u64 off) __ksym; > + > +/* Convenience macro to wrap over bpf_rbtree_add_impl */ > +#define bpf_rbtree_add(head, node, less) bpf_rbtree_add_impl(head, node, less, NULL, 0) Applied, but can we do better here? It's not a new issue. We have the same inefficiency in bpf_obj_drop. BPF program populates 1 or 2 extra registers, but the verifier patches the call insn with necessary values for R4 and R5 for bpf_rbtree_add_impl or R2 for bpf_obj_drop_impl. So one/two register assignments by bpf prog is a dead code. Can we come up with a way to avoid this unnecessary register assignment in bpf prog? Can we keep extern void bpf_rbtree_add(root, node, less) __ksym; ? Both in the kernel and in bpf_experimental.h so that libbpf's bpf_object__resolve_ksym_func_btf_id() -> bpf_core_types_are_compat() check will succeed, but the kernel bpf_rbtree_add will actually have 5 arguments? Maybe always_inline or __attribute__((alias(..))) trick we can use? Or define both and patch bpf code to use _impl later ? @@ -2053,6 +2053,12 @@ __bpf_kfunc int bpf_rbtree_add_impl(struct bpf_rb_root *root, struct bpf_rb_node return __bpf_rbtree_add(root, node, (void *)less, meta ? meta->record : NULL, off); } +__bpf_kfunc notrace int bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node, + bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b)) +{ + return 0; +} Only wastes 3 bytes of .text on x86 and extra BTF_KIND_FUNC in vmlinux BTF, but will save two registers assignment at run-time ?