Re: [PATCH bpf-next 15/17] selftests/bpf: add bpf_for_each(), bpf_for(), and bpf_repeat() macros

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Sat, 4 Mar 2023 15:28:03 -0800

On Sat, Mar 4, 2023 at 12:34 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Thu, Mar 02, 2023 at 03:50:13PM -0800, Andrii Nakryiko wrote:
> > Add bpf_for_each(), bpf_for() and bpf_repeat() macros that make writing
> > open-coded iterator-based loops much more convenient and natural. These
> > macro utilize cleanup attribute to ensure proper destruction of the
> > iterator and thanks to that manage to provide an ergonomic very close to
> > C language for construct. Typical integer loop would look like:
> >
> >   int i;
> >   int arr[N];
> >
> >   bpf_for(i, 0, N) {
> >   /* verifier will know that i >= 0 && i < N, so could be used to
> >    * directly access array elements with no extra checks
> >    */
> >    arr[i] = i;
> >   }
> >
> > bpf_repeat() is very similar, but it doesn't expose iteration number and
> > is mean as a simple "repeat action N times":
> >
> >   bpf_repeat(N) { /* whatever */ }
> >
> > Note that break and continue inside the {} block work as expected.
> >
> > bpf_for_each() is a generalization over any kind of BPF open-coded
> > iterator allowing to use for-each-like approach instead of calling
> > low-level bpf_iter_<type>_{new,next,destroy}() APIs explicitly. E.g.:
> >
> >   struct cgroup *cg;
> >
> >   bpf_for_each(cgroup, cg, some, input, args) {
> >       /* do something with each cg */
> >   }
> >
> > Would call (right now hypothetical) bpf_iter_cgroup_{new,next,destroy}()
> > functions to form a loop over cgroups, where `some, input, args` are
> > passed verbatim into constructor as
> > bpf_iter_cgroup_new(&it, some, input, args).
> >
> > As a demonstration, add pyperf variant based on bpf_for() loop.
> >
> > Also clean up few tests that either included bpf_misc.h header
> > unnecessarily from user-space or included it before any common types are
> > defined.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> > ---
> >  .../bpf/prog_tests/bpf_verif_scale.c          |  6 ++
> >  .../bpf/prog_tests/uprobe_autoattach.c        |  1 -
> >  tools/testing/selftests/bpf/progs/bpf_misc.h  | 76 +++++++++++++++++++
> >  tools/testing/selftests/bpf/progs/lsm.c       |  4 +-
> >  tools/testing/selftests/bpf/progs/pyperf.h    | 14 +++-
> >  .../selftests/bpf/progs/pyperf600_iter.c      |  7 ++
> >  .../selftests/bpf/progs/pyperf600_nounroll.c  |  3 -
> >  7 files changed, 101 insertions(+), 10 deletions(-)
> >  create mode 100644 tools/testing/selftests/bpf/progs/pyperf600_iter.c
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
> > index 5ca252823294..731c343897d8 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
> > @@ -144,6 +144,12 @@ void test_verif_scale_pyperf600_nounroll()
> >       scale_test("pyperf600_nounroll.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false);
> >  }
> >
> > +void test_verif_scale_pyperf600_iter()
> > +{
> > +     /* open-coded BPF iterator version */
> > +     scale_test("pyperf600_iter.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false);
> > +}
> > +
> >  void test_verif_scale_loop1()
> >  {
> >       scale_test("loop1.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false);
> > diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> > index 6558c857e620..d5b3377aa33c 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> > @@ -3,7 +3,6 @@
> >
> >  #include <test_progs.h>
> >  #include "test_uprobe_autoattach.skel.h"
> > -#include "progs/bpf_misc.h"
> >
> >  /* uprobe attach point */
> >  static noinline int autoattach_trigger_func(int arg1, int arg2, int arg3,
> > diff --git a/tools/testing/selftests/bpf/progs/bpf_misc.h b/tools/testing/selftests/bpf/progs/bpf_misc.h
> > index f704885aa534..08a791f307a6 100644
> > --- a/tools/testing/selftests/bpf/progs/bpf_misc.h
> > +++ b/tools/testing/selftests/bpf/progs/bpf_misc.h
> > @@ -75,5 +75,81 @@
> >  #define FUNC_REG_ARG_CNT 5
> >  #endif
> >
> > +struct bpf_iter;
> > +
> > +extern int bpf_iter_num_new(struct bpf_iter *it__uninit, int start, int end) __ksym;
> > +extern int *bpf_iter_num_next(struct bpf_iter *it) __ksym;
> > +extern void bpf_iter_num_destroy(struct bpf_iter *it) __ksym;
> > +
> > +#ifndef bpf_for_each
> > +/* bpf_for_each(iter_kind, elem, args...) provides generic construct for using BPF
> > + * open-coded iterators without having to write mundane explicit low-level
> > + * loop. Instead, it provides for()-like generic construct that can be used
> > + * pretty naturally. E.g., for some hypothetical cgroup iterator, you'd write:
> > + *
> > + * struct cgroup *cg, *parent_cg = <...>;
> > + *
> > + * bpf_for_each(cgroup, cg, parent_cg, CG_ITER_CHILDREN) {
> > + *     bpf_printk("Child cgroup id = %d", cg->cgroup_id);
> > + *     if (cg->cgroup_id == 123)
> > + *         break;
> > + * }
> > + *
> > + * I.e., it looks almost like high-level for each loop in other languages,
> > + * supports continue/break, and is verifiable by BPF verifier.
> > + *
> > + * For iterating integers, the difference betwen bpf_for_each(num, i, N, M)
> > + * and bpf_for(i, N, M) is in that bpf_for() provides additional proof to
> > + * verifier that i is in [N, M) range, and in bpf_for_each() case i is `int
> > + * *`, not just `int`. So for integers bpf_for() is more convenient.
> > + */
> > +#define bpf_for_each(type, cur, args...) for (                                                 \
> > +     /* initialize and define destructor */                                            \
> > +     struct bpf_iter ___it __attribute__((cleanup(bpf_iter_##type##_destroy))),        \
>
> We should probably say somewhere that it requires C99 with some flag that allows
> declaring variables inside the loop.

yes, I'll add a comment. I think cleanup attribute isn't standard as
well, I'll mention it. This shouldn't be restrictive, though, as we
expect very modern Clang (or eventually GCC), which definitely will
support all of that. And I feel like most people don't restrict their
BPF-side code to strict C89 anyways.

>
> Also what are the rules for attr(cleanup()).
> When does it get called?

>From GCC documentation:

  > The cleanup attribute runs a function when the variable goes out of scope.

So given ___it is bound to for loop, any code path that leads to loop
exit (so, when condition turns false or *breaking* out of the loop,
which is why I use cleanup, this was a saving grace for this approach
to work at all).

> My understanding that the visibility of ___it is only within for() body.

right

> So when the prog does:
> bpf_for(i, 0, 10) sum += i;
> bpf_for(i, 0, 10) sum += i;
>
> the compiler should generate bpf_iter_num_destroy right after each bpf_for() ?

Conceptually, yes, but see the note about breaking out of the loop
above. How actual assembly code is generated is beyond our control. If
the compiler generates multiple separate code paths, each with its own
destroy, that's fine as well. No assumptions are made in the verifier,
we just need to see one bpf_iter_<type>_destroy() for each instance of
iterator.

> Or will it group them at the end of function body and destroy all iterators ?

That would be a bug, as documentation states that clean up happens as
soon as a variable goes out of scope. Delaying clean up could result
in program logic bugs. I.e., we rely on destructors to be called as
soon as possible.

> Will compiler reuse the stack space used by ___it in case there are multiple bpf_for-s ?

That's the question to compiler developers, but I'd assume that, yes,
it should. Why not?

And looking at, for example, iter_pass_iter_ptr_to_subprog which has 4
sequential bpf_for() loops:

0000000000002328 <iter_pass_iter_ptr_to_subprog>:
    1125:       bf a6 00 00 00 00 00 00 r6 = r10
    1126:       07 06 00 00 28 ff ff ff r6 += -216    <-- THIS IS ITER
    1127:       bf 61 00 00 00 00 00 00 r1 = r6
    1128:       b4 02 00 00 00 00 00 00 w2 = 0
    1129:       b4 03 00 00 10 00 00 00 w3 = 16
    1130:       85 10 00 00 ff ff ff ff call -1
                0000000000002350:  R_BPF_64_32  bpf_iter_num_new
    1131:       bf a7 00 00 00 00 00 00 r7 = r10
    1132:       07 07 00 00 c0 ff ff ff r7 += -64
    1133:       bf 61 00 00 00 00 00 00 r1 = r6
    1134:       bf 72 00 00 00 00 00 00 r2 = r7
    1135:       b4 03 00 00 10 00 00 00 w3 = 16
    1136:       b4 04 00 00 02 00 00 00 w4 = 2
    1137:       85 10 00 00 53 00 00 00 call 83
                0000000000002388:  R_BPF_64_32  .text
    1138:       bf 61 00 00 00 00 00 00 r1 = r6
    1139:       85 10 00 00 ff ff ff ff call -1
                0000000000002398:  R_BPF_64_32  bpf_iter_num_destroy
    1140:       bf 61 00 00 00 00 00 00 r1 = r6     <<--- HERE WE REUSE
    1141:       b4 02 00 00 00 00 00 00 w2 = 0
    1142:       b4 03 00 00 20 00 00 00 w3 = 32
    1143:       85 10 00 00 ff ff ff ff call -1
                00000000000023b8:  R_BPF_64_32  bpf_iter_num_new

Note that r6 is set to fp-216 and is just reused as is for second
bpf_for loop (second bpf_iter_num_new) call.

>
> > +     /* ___p pointer is just to call bpf_iter_##type##_new() *once* to init ___it */   \
> > +                     *___p = (bpf_iter_##type##_new(&___it, ##args),           \
> > +     /* this is a workaround for Clang bug: it currently doesn't emit BTF */           \
> > +     /* for bpf_iter_##type##_destroy when used from cleanup() attribute */            \
> > +                             (void)bpf_iter_##type##_destroy, (void *)0);              \
> > +     /* iteration and termination check */                                             \
> > +     ((cur = bpf_iter_##type##_next(&___it)));                                         \
> > +     /* nothing here  */                                                               \
> > +)
> > +#endif /* bpf_for_each */
> > +
> > +#ifndef bpf_for
> > +/* bpf_for(i, start, end) proves to verifier that i is in [start, end) */
> > +#define bpf_for(i, start, end) for (                                                   \
> > +     /* initialize and define destructor */                                            \
> > +     struct bpf_iter ___it __attribute__((cleanup(bpf_iter_num_destroy))),             \
> > +     /* ___p pointer is necessary to call bpf_iter_num_new() *once* to init ___it */   \
> > +                     *___p = (bpf_iter_num_new(&___it, (start), (end)),                \
> > +     /* this is a workaround for Clang bug: it currently doesn't emit BTF */           \
> > +     /* for bpf_iter_num_destroy when used from cleanup() attribute */                 \
> > +                             (void)bpf_iter_num_destroy, (void *)0);                   \
> > +     ({                                                                                \
> > +             /* iteration step */                                                      \
> > +             int *___t = bpf_iter_num_next(&___it);                                    \
> > +             /* termination and bounds check */                                        \
> > +             (___t && ((i) = *___t, i >= (start) && i < (end)));                       \
>
> The i >= (start) && i < (end) is necessary to make sure that the verifier
> tightens the range of 'i' inside the body of the loop and
> when the program does arr[i] access the verifier will know that 'i' is within bounds, right?

yes, it feels like a common pattern, but I was contemplating to add
bpf_for_uncheck() where we "expose" i value, but don't do check. I
decided to keep it simple, as most examples actually required bounds
checks anyways. And for cases where we don't, often bpf_repeat()
suffices. One other way would be to expose i from bpf_repeat(), just
no checks.

One restriction with this approach is that I can't define both `struct
bpf_iter __it` and `int i` inside for loop, so cur/i has to be
declared and passed into bpf_for/bpf_for_each by user explicitly. So
for bpf_repeat() to expose i would require always doing:

int i;

bpf_repeat(i, 100) { /* i is set to 0, 1, ..., 99 */ }

which, if you don't care about iteration number, is a bit of waste. So
don't know, I'm undecided on bpf_repeat with i.

>
> In such case should we add __builtin_constant_p() check for 'start' and 'end' ?

that seems to defy the purpose, as if you know start/end statically,
you might as well just write either unrolled loop or bounded for()
loop

> int arr[100];
> if (var < 100)
>   bpf_for(i, 0, global_var) sum += arr[i];
> will fail the verifier and the users might complain of dumb verifier.

but I'm not following this example, so maybe the answer to above would
be different if I would. What's var and global_var? Are they supposed
to be the same thing? If yes, why would that bpf_for() loop fail?

I suspect you are conflating the other pattern I pointed out with:

int cnt = 0;

bpf_for_each(...) {
   if (cnt++ > 100)
      break;
}

It's different, as cnt comes from outside the loop and is updated on
each iteration. While for

bpf_for(i, 0, var) {
   if (i > 100)
        break;
}

it should work fine, as i is originally unknowable, then narrowed to
*a range* [0, 99]. But that [0, 99] knowledge is part of "inner loop
body" state, so it will still converge as it's going to be basically
ignored during the equivalence check on next bpf_iter_num_next() (old
state didn't know about i yet).

>
> Also if start and end are variables they potentially can change between bpf_iter_num_new()
> and in each iteration of the loop.
> __builtin_constant_p() might be too restrictive.

yep, I think so

> May be read start/end once, at least?

I can't do that, as for() allows only one type of variables to be
defined (note `*___p` hack as well), so there is no place to remember
start/end, unfortunately...

So it's a tradeoff. I can drop range validation, but then every
example and lots of production code would be re-checking these
conditions.

>
> > +     });                                                                               \
> > +     /* nothing here  */                                                               \
> > +)
> > +#endif /* bpf_for */
> > +