Re: [PATCH v3 bpf-next 06/13] bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn().

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 22 Nov 2021 19:44:39 -0800

On Mon, Nov 22, 2021 at 7:15 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Mon, Nov 22, 2021 at 5:44 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > On Mon, Nov 22, 2021 at 4:43 PM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Mon, Nov 22, 2021 at 3:47 PM Andrii Nakryiko
> > > <andrii.nakryiko@xxxxxxxxx> wrote:
> > > >
> > > > On Fri, Nov 19, 2021 at 7:33 PM Alexei Starovoitov
> > > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > > >
> > > > > From: Alexei Starovoitov <ast@xxxxxxxxxx>
> > > > >
> > > > > Given BPF program's BTF perform a linear search through kernel BTFs for
> > > > > a possible candidate.
> > > > > Then wire the result into bpf_core_apply_relo_insn().
> > > > >
> > > > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
> > > > > ---
> > > > >  kernel/bpf/btf.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++-
> > > > >  1 file changed, 135 insertions(+), 1 deletion(-)
> > > > >
> > > >
> > > > [...]
> > > >
> > > > >  int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo,
> > > > >                    int relo_idx, void *insn)
> > > > >  {
> > > > > -       return -EOPNOTSUPP;
> > > > > +       if (relo->kind != BPF_CORE_TYPE_ID_LOCAL) {
> > > > > +               struct bpf_core_cand_list *cands;
> > > > > +
> > > > > +               cands = bpf_core_find_cands(ctx, relo->type_id);
> > > >
> > > > this is wrong for many reasons:
> > > >
> > > > 1. you will overwrite previous ctx->cands, if it was already set,
> > > > which leaks memory
> > > > 2. this list of candidates should be keyed by relo->type_id ("root
> > > > type"). Different root types get their own independent lists; so it
> > > > has to be some sort of look up table from type_id to a list of
> > > > candidates.
> > > >
> > > > 2) means that if you had a bunch of relos against struct task_struct,
> > > > you'll crate a list of candidates when processing first relo that
> > > > starts at task_struct. All the subsequent relos that have task_struct
> > > > as root type will re-used that list and potentially trim it down. If
> > > > there are some other relos against, say, struct mm_struct, they will
> > > > have their independent list of candidates.
> > >
> > > right.
> > > Your prior comment confused me. I didn't do this reuse of cands
> > > to avoid introducing hashtable here like libbpf does,
> > > since it does too little to actually help.
> >
> > Since when avoiding millions of iterations for each relocation is "too
> > little help"?
>
> because it is a premature optimization for a case that
> may or may not be relevant.
> If 180 sk_buff relocations somehow makes the loading too slow
> 180 relocations of 180 different types would produce exactly
> the same slow down and hashtable cache won't help.

Likelihood of using 180 different root types in real application is
very small. Using 180 relocations (either with explicit BPF_CORE_READ,
field accesses in fentry, or just through always_inline helpers doing
either and being inlined in multiple places) is very real in
real-world non-trivial applications. And the cost of optimizing that
in the kernel later is very high, you'll be waiting for a new kernel
release to get everywhere to rely on this optimization. The cost of
further optimizing this in libbpf is much smaller (and libbpf still
did the optimization from the get go and I stand by that decision).

If you think I'm making this up, we have one security-related BPF app
with 1076 CO-RE relocations across 11 BPF programs. It's using 22
different root types.

>
> > BTW, I've tried to measure how noticeable this will be and added
> > test_verif_scale2 to core_kern with only change to use vmlinux.h (so
> > that __sk_buff accesses are CO-RE relocated). I didn't manage to get
> > it loaded, so something else is going there. So please take a look, I
> > don't really know how to debug lskel stuff. Here are the changes:
>
> Looking. Thanks for the test.
>
> > > If we actually need to optimize this part of loading
> > > we better do persistent cache of
> > > name -> kernel btf_type-s
> > > and reuse it across different programs.
> >
> > You can't reuse it because the same type name in a BPF object BTF can
> > be resolved to different kernel types (e.g., if we still had
> > ring_buffer name collision),
>
> well and the candidate list will have two kernel types with the same name.
> Both kept in a cache.
> so cache_lookup("ring_buffer") will return two kernel types.
> That would be the case for all progs being loaded.
> What am I missing?

if there are two matching types with the same matching field but field
offsets are different (and thus there is ambiguity), that's an error.
So the correct (by current definition, at least) program has to result
in one of such two incompatible ring_buffer types and only one. If
there are multiple duplicates, though, (like task_struct and
task_struct___2) they will have identical sets of fields at the same
offsets, so both will remain possible candidates and that's not an
error. But for actually two different types, there can be only one
winner.