On Mon, Nov 22, 2021 at 5:44 PM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Mon, Nov 22, 2021 at 4:43 PM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Mon, Nov 22, 2021 at 3:47 PM Andrii Nakryiko > > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > > > On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov > > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > > > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > > > > > > > Given BPF program's BTF perform a linear search through kernel BTFs for > > > > a possible candidate. > > > > Then wire the result into bpf_core_apply_relo_insn(). > > > > > > > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> > > > > --- > > > > kernel/bpf/btf.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++- > > > > 1 file changed, 135 insertions(+), 1 deletion(-) > > > > > > > > > > [...] > > > > > > > int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo, > > > > int relo_idx, void *insn) > > > > { > > > > - return -EOPNOTSUPP; > > > > + if (relo->kind != BPF_CORE_TYPE_ID_LOCAL) { > > > > + struct bpf_core_cand_list *cands; > > > > + > > > > + cands = bpf_core_find_cands(ctx, relo->type_id); > > > > > > this is wrong for many reasons: > > > > > > 1. you will overwrite previous ctx->cands, if it was already set, > > > which leaks memory > > > 2. this list of candidates should be keyed by relo->type_id ("root > > > type"). Different root types get their own independent lists; so it > > > has to be some sort of look up table from type_id to a list of > > > candidates. > > > > > > 2) means that if you had a bunch of relos against struct task_struct, > > > you'll crate a list of candidates when processing first relo that > > > starts at task_struct. All the subsequent relos that have task_struct > > > as root type will re-used that list and potentially trim it down. If > > > there are some other relos against, say, struct mm_struct, they will > > > have their independent list of candidates. > > > > right. > > Your prior comment confused me. I didn't do this reuse of cands > > to avoid introducing hashtable here like libbpf does, > > since it does too little to actually help. > > Since when avoiding millions of iterations for each relocation is "too > little help"? > > BTW, I've tried to measure how noticeable this will be and added > test_verif_scale2 to core_kern with only change to use vmlinux.h (so > that __sk_buff accesses are CO-RE relocated). I didn't manage to get > it loaded, so something else is going there. So please take a look, I > don't really know how to debug lskel stuff. Here are the changes: > > diff --git a/tools/testing/selftests/bpf/progs/core_kern.c > b/tools/testing/selftests/bpf/progs/core_kern.c > index 3b4571d68369..9916cf059883 100644 > --- a/tools/testing/selftests/bpf/progs/core_kern.c > +++ b/tools/testing/selftests/bpf/progs/core_kern.c > @@ -4,6 +4,10 @@ > > #include <bpf/bpf_helpers.h> > #include <bpf/bpf_tracing.h> > +#include <bpf/bpf_core_read.h> > + > +#define ATTR __always_inline > +#include "test_jhash.h" > > struct { > __uint(type, BPF_MAP_TYPE_ARRAY); > @@ -57,4 +61,27 @@ int BPF_PROG(fexit_eth_type_trans, struct sk_buff *skb, > return randmap(dev->ifindex + skb->len); > } > > +SEC("tc") > +int balancer_ingress(struct __sk_buff *ctx) > +{ > + void *data_end = (void *)(long)ctx->data_end; > + void *data = (void *)(long)ctx->data; > + void *ptr; > + int ret = 0, nh_off, i = 0; > + > + nh_off = 14; > + > + /* pragma unroll doesn't work on large loops */ > + > +#define C do { \ > + ptr = data + i; \ > + if (ptr + nh_off > data_end) \ > + break; \ > + ctx->tc_index = jhash(ptr, nh_off, ctx->cb[0] + i++); \ > + } while (0); > +#define C30 C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C;C; > + C30;C30;C30; /* 90 calls */ > + return 0; > +} > + > char LICENSE[] SEC("license") = "GPL"; > diff --git a/tools/testing/selftests/bpf/progs/test_verif_scale2.c > b/tools/testing/selftests/bpf/progs/test_verif_scale2.c > index f024154c7be7..9e2c2a6954cb 100644 > --- a/tools/testing/selftests/bpf/progs/test_verif_scale2.c > +++ b/tools/testing/selftests/bpf/progs/test_verif_scale2.c > @@ -1,6 +1,6 @@ > // SPDX-License-Identifier: GPL-2.0 > // Copyright (c) 2019 Facebook > -#include <linux/bpf.h> > +#include "vmlinux.h" > #include <bpf/bpf_helpers.h> > #define ATTR __always_inline > #include "test_jhash.h" > > > The version with libbpf does load successfully, while lskel one gives: > > #35 core_kern_lskel:FAIL > test_core_kern_lskel:FAIL:open_and_load unexpected error: -22 > > Same stuff with libbpf skeleton successfully loaded. > > If you manage to get it working, you'll have a BPF program 181 CO-RE > relocations, let's see how noticeable it will be to do 181 iterations > over ~2mln vmlinux BTF types. Just realized that I was off by one order of magnitude, it's about 100K types, not 2 million. But the point stands, it is quite a lot. > > > I think I will go back to the prior version: linear search for every relo. > > I don't understand the resistance, the kernel has both rbtree and > hashmaps, what's the problem just using that? > > > If we actually need to optimize this part of loading > > we better do persistent cache of > > name -> kernel btf_type-s > > and reuse it across different programs. > > You can't reuse it because the same type name in a BPF object BTF can > be resolved to different kernel types (e.g., if we still had > ring_buffer name collision), so this is all per-BPF object (or at > least per BPF program). We still have duplicated types in some more > fuller kernel configurations, and it can always happen down the road.