Re: [PATCH bpf-next 3/3] selftests/bpf: add veristat tool for mass-verifying BPF object files

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 9 Sep 2022 12:18:40 -0700

On Sun, Aug 28, 2022 at 11:02 PM Kui-Feng Lee <kuifeng@xxxxxx> wrote:
>
> On Fri, 2022-08-26 at 16:15 -0700, Andrii Nakryiko wrote:
> > Add a small tool, veristat, that allows mass-verification of
> > a set of *libbpf-compatible* BPF ELF object files. For each such
> > object
> > file, veristat will attempt to verify each BPF program
> > *individually*.
> > Regardless of success or failure, it parses BPF verifier stats and
> > outputs them in human-readable table format. In the future we can
> > also
> > add CSV and JSON output for more scriptable post-processing, if
> > necessary.
> >
> > veristat allows to specify a set of stats that should be output and
> > ordering between multiple objects and files (e.g., so that one can
> > easily order by total instructions processed, instead of default file
> > name, prog name, verdict, total instructions order).
> >
> > This tool should be useful for validating various BPF verifier
> > changes
> > or even validating different kernel versions for regressions.
> >
> > Here's an example for some of the heaviest selftests/bpf BPF object
> > files:
> >
> >   $ sudo ./veristat -s insns,file,prog
> > {pyperf,loop,test_verif_scale,strobemeta,test_cls_redirect,profiler}*
> > .linked3.o
> >   File
> > Program                               Verdict  Duration, us  Total
> > insns  Total states  Peak states

[...]

> >
> > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> > ---
> >  tools/testing/selftests/bpf/.gitignore |   1 +
> >  tools/testing/selftests/bpf/Makefile   |   6 +-
> >  tools/testing/selftests/bpf/veristat.c | 541
> > +++++++++++++++++++++++++
> >  3 files changed, 547 insertions(+), 1 deletion(-)
> >  create mode 100644 tools/testing/selftests/bpf/veristat.c
> >
> > diff --git a/tools/testing/selftests/bpf/.gitignore
> > b/tools/testing/selftests/bpf/.gitignore
> > index 3a8cb2404ea6..3b288562963e 100644
> > --- a/tools/testing/selftests/bpf/.gitignore
> > +++ b/tools/testing/selftests/bpf/.gitignore
> > @@ -39,6 +39,7 @@ test_cpp
> >  /tools
> >  /runqslower
> >  /bench
> > +/veristat
> >  *.ko
> >  *.tmp
> >  xskxceiver
>
> [...]
>
> > +
> > +static int process_prog(const char *filename, struct bpf_object
> > *obj, struct bpf_program *prog)
> > +{

[...]

> > +               bpf_object__for_each_program(tprog, tobj) {
> > +                       const char *tprog_name =
> > bpf_program__name(tprog);
> > +
> > +                       if (strcmp(prog_name, tprog_name) == 0) {
> > +                               bpf_program__set_autoload(tprog,
> > true);
> > +                               lprog = tprog;
> > +                       } else {
> > +                               bpf_program__set_autoload(tprog,
> > false);
> > +                       }
> > +               }
> > +
> > +               process_prog(filename, tobj, lprog);
> > +               bpf_object__close(tobj);
> > +       }
>
> It leaks obj.

good catch, fixed.

>
> > > +
> > +cleanup:
> > +       libbpf_set_print(old_libbpf_print_fn);
> > +       return err;
> > +}
> > +

[...]

> > +
> > +#define snappendf(dst, len, fmt,
> > args...)                                      \
> > +       len += snprintf(dst +
> > len,                                              \
> > +                             sizeof(dst) < len ? 0 : sizeof(dst) -
> > len,        \
> > +                             fmt, ##args)
>
> Never been used?

yep, dropped

>
> > +
> > +#define HEADER_CHAR '-'
> > +#define COLUMN_SEP "  "
> > +
> > +static void output_headers(bool calc_len)
> > +{
> > +       int i, len;
> > +
> > +       for (i = 0; i < env.output_spec.spec_cnt; i++) {
> > +               int id = env.output_spec.ids[i];
> > +               int *max_len = &env.output_spec.lens[i];
> > +
> > +               if (calc_len) {
> > +                       len = snprintf(NULL, 0, "%s",
> > stat_defs[id].header);
>
> Is there any reason to no use strlen()?

to keep it consistent with other snprintf() uses for calc_len case

> It would be ok to merge this block to one line since this function is
> always called exactly once before output_stats() with calc_len being
> true.  For example,
>
>  *max_len = strlen(...)
>

yes, but I wanted to keep it consistent and not have to keep track of
ordering dependencies. So I'd prefer to keep it as is.

>
> > +                       if (len > *max_len)
> > +                               *max_len = len;
> > +               } else {
> > +                       printf("%s%-*s", i == 0 ? "" : COLUMN_SEP,
> > *max_len, stat_defs[id].header);
> > +               }
> > +       }
> > +
> > +       if (!calc_len)
> > +               printf("\n");
> > +}
> > +
> > +static void output_lines(void)
>
> lines? It is confusing for me since it always prints exactly one line.
> How about something like output_sep_line()?

lines refers to "---------" under headers and at the bottom, so
multiple of them. sep (for separator) is also a bit misleading. I'll
call it output_header_underlines().

>
> > +{
> > +       int i, j, len;
> > +
> > +       for (i = 0; i < env.output_spec.spec_cnt; i++) {
> > +               len = env.output_spec.lens[i];
> > +
> > +               printf("%s", i == 0 ? "" : COLUMN_SEP);
> > +               for (j = 0; j < len; j++)
> > +                       printf("%c", HEADER_CHAR);
> > +       }
> > +       printf("\n");
> > +}
> > +

[...]

note: it's a good idea to trim irrelevant parts of the email to keep
it shorter and easier to read, if patch is long