On Sat, Mar 04, 2023 at 03:28:03PM -0800, Andrii Nakryiko wrote: > On Sat, Mar 4, 2023 at 12:34 PM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Thu, Mar 02, 2023 at 03:50:13PM -0800, Andrii Nakryiko wrote: > > > Add bpf_for_each(), bpf_for() and bpf_repeat() macros that make writing > > > open-coded iterator-based loops much more convenient and natural. These > > > macro utilize cleanup attribute to ensure proper destruction of the > > > iterator and thanks to that manage to provide an ergonomic very close to > > > C language for construct. Typical integer loop would look like: > > > > > > int i; > > > int arr[N]; > > > > > > bpf_for(i, 0, N) { > > > /* verifier will know that i >= 0 && i < N, so could be used to > > > * directly access array elements with no extra checks > > > */ > > > arr[i] = i; > > > } > > > > > > bpf_repeat() is very similar, but it doesn't expose iteration number and > > > is mean as a simple "repeat action N times": > > > > > > bpf_repeat(N) { /* whatever */ } > > > > > > Note that break and continue inside the {} block work as expected. > > > > > > bpf_for_each() is a generalization over any kind of BPF open-coded > > > iterator allowing to use for-each-like approach instead of calling > > > low-level bpf_iter_<type>_{new,next,destroy}() APIs explicitly. E.g.: > > > > > > struct cgroup *cg; > > > > > > bpf_for_each(cgroup, cg, some, input, args) { > > > /* do something with each cg */ > > > } > > > > > > Would call (right now hypothetical) bpf_iter_cgroup_{new,next,destroy}() > > > functions to form a loop over cgroups, where `some, input, args` are > > > passed verbatim into constructor as > > > bpf_iter_cgroup_new(&it, some, input, args). > > > > > > As a demonstration, add pyperf variant based on bpf_for() loop. > > > > > > Also clean up few tests that either included bpf_misc.h header > > > unnecessarily from user-space or included it before any common types are > > > defined. > > > > > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > > > --- > > > .../bpf/prog_tests/bpf_verif_scale.c | 6 ++ > > > .../bpf/prog_tests/uprobe_autoattach.c | 1 - > > > tools/testing/selftests/bpf/progs/bpf_misc.h | 76 +++++++++++++++++++ > > > tools/testing/selftests/bpf/progs/lsm.c | 4 +- > > > tools/testing/selftests/bpf/progs/pyperf.h | 14 +++- > > > .../selftests/bpf/progs/pyperf600_iter.c | 7 ++ > > > .../selftests/bpf/progs/pyperf600_nounroll.c | 3 - > > > 7 files changed, 101 insertions(+), 10 deletions(-) > > > create mode 100644 tools/testing/selftests/bpf/progs/pyperf600_iter.c > > > > > > diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c > > > index 5ca252823294..731c343897d8 100644 > > > --- a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c > > > +++ b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c > > > @@ -144,6 +144,12 @@ void test_verif_scale_pyperf600_nounroll() > > > scale_test("pyperf600_nounroll.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false); > > > } > > > > > > +void test_verif_scale_pyperf600_iter() > > > +{ > > > + /* open-coded BPF iterator version */ > > > + scale_test("pyperf600_iter.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false); > > > +} > > > + > > > void test_verif_scale_loop1() > > > { > > > scale_test("loop1.bpf.o", BPF_PROG_TYPE_RAW_TRACEPOINT, false); > > > diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c > > > index 6558c857e620..d5b3377aa33c 100644 > > > --- a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c > > > +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c > > > @@ -3,7 +3,6 @@ > > > > > > #include <test_progs.h> > > > #include "test_uprobe_autoattach.skel.h" > > > -#include "progs/bpf_misc.h" > > > > > > /* uprobe attach point */ > > > static noinline int autoattach_trigger_func(int arg1, int arg2, int arg3, > > > diff --git a/tools/testing/selftests/bpf/progs/bpf_misc.h b/tools/testing/selftests/bpf/progs/bpf_misc.h > > > index f704885aa534..08a791f307a6 100644 > > > --- a/tools/testing/selftests/bpf/progs/bpf_misc.h > > > +++ b/tools/testing/selftests/bpf/progs/bpf_misc.h > > > @@ -75,5 +75,81 @@ > > > #define FUNC_REG_ARG_CNT 5 > > > #endif > > > > > > +struct bpf_iter; > > > + > > > +extern int bpf_iter_num_new(struct bpf_iter *it__uninit, int start, int end) __ksym; > > > +extern int *bpf_iter_num_next(struct bpf_iter *it) __ksym; > > > +extern void bpf_iter_num_destroy(struct bpf_iter *it) __ksym; > > > + > > > +#ifndef bpf_for_each > > > +/* bpf_for_each(iter_kind, elem, args...) provides generic construct for using BPF > > > + * open-coded iterators without having to write mundane explicit low-level > > > + * loop. Instead, it provides for()-like generic construct that can be used > > > + * pretty naturally. E.g., for some hypothetical cgroup iterator, you'd write: > > > + * > > > + * struct cgroup *cg, *parent_cg = <...>; > > > + * > > > + * bpf_for_each(cgroup, cg, parent_cg, CG_ITER_CHILDREN) { > > > + * bpf_printk("Child cgroup id = %d", cg->cgroup_id); > > > + * if (cg->cgroup_id == 123) > > > + * break; > > > + * } > > > + * > > > + * I.e., it looks almost like high-level for each loop in other languages, > > > + * supports continue/break, and is verifiable by BPF verifier. > > > + * > > > + * For iterating integers, the difference betwen bpf_for_each(num, i, N, M) > > > + * and bpf_for(i, N, M) is in that bpf_for() provides additional proof to > > > + * verifier that i is in [N, M) range, and in bpf_for_each() case i is `int > > > + * *`, not just `int`. So for integers bpf_for() is more convenient. > > > + */ > > > +#define bpf_for_each(type, cur, args...) for ( \ > > > + /* initialize and define destructor */ \ > > > + struct bpf_iter ___it __attribute__((cleanup(bpf_iter_##type##_destroy))), \ > > > > We should probably say somewhere that it requires C99 with some flag that allows > > declaring variables inside the loop. > > yes, I'll add a comment. I think cleanup attribute isn't standard as > well, I'll mention it. This shouldn't be restrictive, though, as we > expect very modern Clang (or eventually GCC), which definitely will > support all of that. And I feel like most people don't restrict their > BPF-side code to strict C89 anyways. yep. No UX concerns. A comment to manage expectations should be enough. > > > > Also what are the rules for attr(cleanup()). > > When does it get called? > > From GCC documentation: > > > The cleanup attribute runs a function when the variable goes out of scope. > > So given ___it is bound to for loop, any code path that leads to loop > exit (so, when condition turns false or *breaking* out of the loop, > which is why I use cleanup, this was a saving grace for this approach > to work at all). +1 > > > My understanding that the visibility of ___it is only within for() body. > > right > > > So when the prog does: > > bpf_for(i, 0, 10) sum += i; > > bpf_for(i, 0, 10) sum += i; > > > > the compiler should generate bpf_iter_num_destroy right after each bpf_for() ? > > Conceptually, yes, but see the note about breaking out of the loop > above. How actual assembly code is generated is beyond our control. If > the compiler generates multiple separate code paths, each with its own > destroy, that's fine as well. No assumptions are made in the verifier, > we just need to see one bpf_iter_<type>_destroy() for each instance of > iterator. > > > Or will it group them at the end of function body and destroy all iterators ? > > That would be a bug, as documentation states that clean up happens as > soon as a variable goes out of scope. Delaying clean up could result > in program logic bugs. I.e., we rely on destructors to be called as > soon as possible. > > > Will compiler reuse the stack space used by ___it in case there are multiple bpf_for-s ? > > That's the question to compiler developers, but I'd assume that, yes, > it should. Why not? > > And looking at, for example, iter_pass_iter_ptr_to_subprog which has 4 > sequential bpf_for() loops: > > 0000000000002328 <iter_pass_iter_ptr_to_subprog>: > 1125: bf a6 00 00 00 00 00 00 r6 = r10 > 1126: 07 06 00 00 28 ff ff ff r6 += -216 <-- THIS IS ITER > 1127: bf 61 00 00 00 00 00 00 r1 = r6 > 1128: b4 02 00 00 00 00 00 00 w2 = 0 > 1129: b4 03 00 00 10 00 00 00 w3 = 16 > 1130: 85 10 00 00 ff ff ff ff call -1 > 0000000000002350: R_BPF_64_32 bpf_iter_num_new > 1131: bf a7 00 00 00 00 00 00 r7 = r10 > 1132: 07 07 00 00 c0 ff ff ff r7 += -64 > 1133: bf 61 00 00 00 00 00 00 r1 = r6 > 1134: bf 72 00 00 00 00 00 00 r2 = r7 > 1135: b4 03 00 00 10 00 00 00 w3 = 16 > 1136: b4 04 00 00 02 00 00 00 w4 = 2 > 1137: 85 10 00 00 53 00 00 00 call 83 > 0000000000002388: R_BPF_64_32 .text > 1138: bf 61 00 00 00 00 00 00 r1 = r6 > 1139: 85 10 00 00 ff ff ff ff call -1 > 0000000000002398: R_BPF_64_32 bpf_iter_num_destroy > 1140: bf 61 00 00 00 00 00 00 r1 = r6 <<--- HERE WE REUSE > 1141: b4 02 00 00 00 00 00 00 w2 = 0 > 1142: b4 03 00 00 20 00 00 00 w3 = 32 > 1143: 85 10 00 00 ff ff ff ff call -1 > 00000000000023b8: R_BPF_64_32 bpf_iter_num_new > > Note that r6 is set to fp-216 and is just reused as is for second > bpf_for loop (second bpf_iter_num_new) call. Great. Thanks for checking. I was worried that attr(cleanup) will force compiler to think that different objects cannot reuse the stack slots which will lead to stack exhaustion. Especially with 24-bytes bpf_iter-s and our tiny 512 limit. > > > > > + /* ___p pointer is just to call bpf_iter_##type##_new() *once* to init ___it */ \ > > > + *___p = (bpf_iter_##type##_new(&___it, ##args), \ > > > + /* this is a workaround for Clang bug: it currently doesn't emit BTF */ \ > > > + /* for bpf_iter_##type##_destroy when used from cleanup() attribute */ \ > > > + (void)bpf_iter_##type##_destroy, (void *)0); \ > > > + /* iteration and termination check */ \ > > > + ((cur = bpf_iter_##type##_next(&___it))); \ > > > + /* nothing here */ \ > > > +) > > > +#endif /* bpf_for_each */ > > > + > > > +#ifndef bpf_for > > > +/* bpf_for(i, start, end) proves to verifier that i is in [start, end) */ > > > +#define bpf_for(i, start, end) for ( \ > > > + /* initialize and define destructor */ \ > > > + struct bpf_iter ___it __attribute__((cleanup(bpf_iter_num_destroy))), \ > > > + /* ___p pointer is necessary to call bpf_iter_num_new() *once* to init ___it */ \ > > > + *___p = (bpf_iter_num_new(&___it, (start), (end)), \ > > > + /* this is a workaround for Clang bug: it currently doesn't emit BTF */ \ > > > + /* for bpf_iter_num_destroy when used from cleanup() attribute */ \ > > > + (void)bpf_iter_num_destroy, (void *)0); \ > > > + ({ \ > > > + /* iteration step */ \ > > > + int *___t = bpf_iter_num_next(&___it); \ > > > + /* termination and bounds check */ \ > > > + (___t && ((i) = *___t, i >= (start) && i < (end))); \ > > > > The i >= (start) && i < (end) is necessary to make sure that the verifier > > tightens the range of 'i' inside the body of the loop and > > when the program does arr[i] access the verifier will know that 'i' is within bounds, right? > > yes, it feels like a common pattern, but I was contemplating to add > bpf_for_uncheck() where we "expose" i value, but don't do check. I > decided to keep it simple, as most examples actually required bounds > checks anyways. And for cases where we don't, often bpf_repeat() > suffices. One other way would be to expose i from bpf_repeat(), just > no checks. > > One restriction with this approach is that I can't define both `struct > bpf_iter __it` and `int i` inside for loop, so cur/i has to be > declared and passed into bpf_for/bpf_for_each by user explicitly. So > for bpf_repeat() to expose i would require always doing: > > int i; > > bpf_repeat(i, 100) { /* i is set to 0, 1, ..., 99 */ } > > which, if you don't care about iteration number, is a bit of waste. So > don't know, I'm undecided on bpf_repeat with i. It feels that the proposed bpf_repeat() without 'i' is cleaner. > > > > In such case should we add __builtin_constant_p() check for 'start' and 'end' ? > > that seems to defy the purpose, as if you know start/end statically, > you might as well just write either unrolled loop or bounded for() > loop > > > int arr[100]; > > if (var < 100) > > bpf_for(i, 0, global_var) sum += arr[i]; > > will fail the verifier and the users might complain of dumb verifier. > > > but I'm not following this example, so maybe the answer to above would > be different if I would. What's var and global_var? typo. I meant the same var in both places. > Are they supposed > to be the same thing? If yes, why would that bpf_for() loop fail? > > I suspect you are conflating the other pattern I pointed out with: > > int cnt = 0; > > bpf_for_each(...) { > if (cnt++ > 100) > break; > } > > It's different, as cnt comes from outside the loop and is updated on > each iteration. While for > > bpf_for(i, 0, var) { > if (i > 100) > break; > } > > it should work fine, as i is originally unknowable, then narrowed to > *a range* [0, 99]. But that [0, 99] knowledge is part of "inner loop > body" state, so it will still converge as it's going to be basically > ignored during the equivalence check on next bpf_iter_num_next() (old > state didn't know about i yet). > > > > > Also if start and end are variables they potentially can change between bpf_iter_num_new() > > and in each iteration of the loop. > > __builtin_constant_p() might be too restrictive. > > yep, I think so > > > May be read start/end once, at least? > > I can't do that, as for() allows only one type of variables to be > defined (note `*___p` hack as well), so there is no place to remember > start/end, unfortunately... I see. Abusing bpf_iter like: for (struct bpf_iter_num __it, *___p, start_end = (struct bpf_iter_num) {start, end}; is probably too ugly just to READ_ONCE(start). Especially since bpf_iter_num is opaque. Please add a comment to warn users that if start/end are variables they better not change during bpf_for(). I guess plain C loop: for (int i = 0; i < j; i++) will go crazy too if 'j' changes. I'm not sure what standard says here. Whether compiler has to reload 'j' every iteration or not. > So it's a tradeoff. I can drop range validation, but then every > example and lots of production code would be re-checking these > conditions. Understood. Keep it as-is. > > > > > + }); \ > > > + /* nothing here */ \ btw that 'nothing here' comment is distracting when reading this macro.