Re: [PATCH v3 bpf-next 02/12] libbpf: implement BPF CO-RE offset relocation algorithm

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Sat, 3 Aug 2019 09:25:58 -0700

On Fri, Aug 02, 2019 at 11:30:21PM -0700, Andrii Nakryiko wrote:
> 
> No, not anonymous.
> 
> struct my_struct___local {
>     int a;
> };
> 
> struct my_struct___target {
>     long long a;
> };
> 
> my_struct___local->a will not match my_struct___target->a, but it's
> not a reason to stop relocation process due to error.

why? It feels that this is exactly the reason to fail relocation.
struct names matched. field names matched.
but the types are different. Why proceed further?

Also what about str->a followed by str->b.
Is it possible that str->a will pick up one flavor when str->b different one?
That will likely lead to wrong behavior?

> 
> All the tests I added use non-numeric flavors. While technically I can
> use just ___1, ___2 and so on, it will greatly reduce readability,
> while not really solving any problem (nothing prevents someone to add
> something like lmc___1 eventually).
> 
> I think it's not worth it to complicate this logic just for
> lmc___{softc,media,ctl}, but we can do 2) - try to match any struct as
> is. If that fails, see if it's a "flavor" and match flavors.

Could you please share benchmarking results of largish bpf prog
with couple thousands relocations against typical vmlinux.h ?
I'm concerned that this double check will be noticeable.
May be llvm should recognize "flavor" in the type name and
encode them differently in BTF ?
Or add a pre-pass to libbpf to sort out all types into flavored and not.
If flavored search is expensive may be all flavors could be a linked list
from the base type. The typical case is one or two flavors, right?

> > > > > +     for (i = 0, j = 0; i < cand_ids->len; i++) {
> > > > > +             cand_id = cand_ids->data[i];
> > > > > +             cand_type = btf__type_by_id(targ_btf, cand_id);
> > > > > +             cand_name = btf__name_by_offset(targ_btf, cand_type->name_off);
> > > > > +
> > > > > +             err = bpf_core_spec_match(&local_spec, targ_btf,
> > > > > +                                       cand_id, &cand_spec);
> > > > > +             if (err < 0) {
> > > > > +                     pr_warning("prog '%s': relo #%d: failed to match spec ",
> > > > > +                                prog_name, relo_idx);
> > > > > +                     bpf_core_dump_spec(LIBBPF_WARN, &local_spec);
> > > > > +                     libbpf_print(LIBBPF_WARN,
> > > > > +                                  " to candidate #%d [%d] (%s): %d\n",
> > > > > +                                  i, cand_id, cand_name, err);
> > > > > +                     return err;
> > > > > +             }
> > > > > +             if (err == 0) {
> > > > > +                     pr_debug("prog '%s': relo #%d: candidate #%d [%d] (%s) doesn't match spec ",
> > > > > +                              prog_name, relo_idx, i, cand_id, cand_name);
> > > > > +                     bpf_core_dump_spec(LIBBPF_DEBUG, &local_spec);
> > > > > +                     libbpf_print(LIBBPF_DEBUG, "\n");
> > > > > +                     continue;
> > > > > +             }
> > > > > +
> > > > > +             pr_debug("prog '%s': relo #%d: candidate #%d matched as spec ",
> > > > > +                      prog_name, relo_idx, i);
> > > >
> > > > did you mention that you're going to make a helper for this debug dumps?
> > >
> > > yeah, I added bpf_core_dump_spec(), but I don't know how to shorten
> > > this further... This output is extremely useful to understand what's
> > > happening and will be invaluable when users will inevitably report
> > > confusing behavior in some cases, so I still want to keep it.
> >
> > not sure yet. Just pointing out that this function has more debug printfs
> > than actual code which doesn't look right.
> > We have complex algorithms in the kernel (like verifier).
> > Yet we don't sprinkle printfs in there to this degree.
> >
> 
> We do have a verbose verifier logging, though, exactly to help users
> to debug issues, which is extremely helpful and is greatly appreciated
> by users.
> There is nothing worse for developer experience than getting -EINVAL
> without any useful log message. Been there, banged my head against the
> wall wishing for a bit more verbose log. What are we trying to
> optimize for here?

All I'm saying that three printfs in a row that essentially convey the same info
look like clowntown. Some level of verbosity is certainly useful.