Re: [PATCH bpf-next 1/2] bpf: Introduce bpf_can_loop() kfunc

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Fri, 23 Feb 2024 16:22:11 -0800

On Fri, Feb 23, 2024 at 4:00 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Wed, 2024-02-21 at 22:33 -0800, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@xxxxxxxxxx>
> >
> > While working on bpf_arena the following monster macro had to be
> > used to iterate a link list:
> >         for (struct bpf_iter_num ___it __attribute__((aligned(8),                       \
> >                                                       cleanup(bpf_iter_num_destroy))),  \
> >                         * ___tmp = (                                                    \
> >                                 bpf_iter_num_new(&___it, 0, (1000000)),                 \
> >                                 pos = list_entry_safe((head)->first,                    \
> >                                                       typeof(*(pos)), member),          \
> >                                 (void)bpf_iter_num_destroy, (void *)0);                 \
> >              bpf_iter_num_next(&___it) && pos &&                                        \
> >                 ({ ___tmp = (void *)pos->member.next; 1; });                            \
> >              pos = list_entry_safe((void __arena *)___tmp, typeof(*(pos)), member))
> >
> > It's similar to bpf_for(), bpf_repeat() macros.
> > Unfortunately every "for" in normal C code needs an equivalent monster macro.
>
> Tbh, I really don't like the idea of adding more hacks for loop handling.
> This would be the fourth: regular bounded loops, bpf_loop, iterators
> and now bpf_can_loop. Imo, it would be better to:
> - either create a reusable "semi-monster" macro e.g. "__with_bound(iter_name)"
>   that would hide "struct bpf_iter_num iter_name ... cleanup ..." declaration
>   to simplify definitions of other iterating macro;
> - or add kfuncs for list iterator.

I think you're missing the point.
It's not about this particular list iterator.
It's about _all_ for(), while() loops.
I've started converting lib/radix-tree.c to bpf and arena.
There are hundreds of various loops that need to be converted.
The best is to copy-paste them as-is and add bpf_can_loop() to loop
condition. That's it.
Otherwise explicit iterators are changing the code significantly
and distract from the logic of the algorithm.

Another key point is the last sentence of the commit log:
"New instruction with the same semantics can be added, so that LLVM
can generate it."

This is the way to implement __builtin_memcpy, __builtin_strcmp
and friends in llvm and gcc.

We can go a step further and add such insn to any for/while loop
that uses arena pointers.
Then running kernel code as bpf program will be copy-paste
and addition of __arena tag. Easy.

And if we can make:
#pragma clang attribute push (__attribute__((address_space(1))),
apply_to = pointers)

work, then we won't even need to copy paste!
We will be able to compile lib/radix-tree.c with --target=bpf as-is.

> Having said that, I don't see any obvious technical issues with this particular patch.
> A few nits below.
>
> [...]
>
> > @@ -7954,10 +7956,14 @@ static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
> >       struct bpf_reg_state *cur_iter, *queued_iter;
> >       int iter_frameno = meta->iter.frameno;
> >       int iter_spi = meta->iter.spi;
> > +     bool is_can_loop = is_can_loop_kfunc(meta);
> >
> >       BTF_TYPE_EMIT(struct bpf_iter);
> >
> > -     cur_iter = &env->cur_state->frame[iter_frameno]->stack[iter_spi].spilled_ptr;
> > +     if (is_can_loop)
> > +             cur_iter = &cur_st->can_loop_reg;
> > +     else
> > +             cur_iter = &cur_st->frame[iter_frameno]->stack[iter_spi].spilled_ptr;
>
> I think that adding of a utility function hiding this choice, e.g.:
>
>     get_iter_reg(struct bpf_verifier_state *st, int insn_idx)
>
> would simplify the code a bit, here and in is_state_visited().

Hmm. That sounds like obfuscation, since 'meta' would need to be passed in,
but is_state_visited() doesn't have meta.
Create fake meta there?!

I'm missing how such get_iter_reg() helper will look.
meta->iter.frameno was populated by process_iter_arg().
Far away from process_iter_next_call().

>
> [...]
>
> > @@ -19549,6 +19608,8 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> >                       delta    += cnt - 1;
> >                       env->prog = prog = new_prog;
> >                       insn      = new_prog->insnsi + i + delta;
> > +                     if (stack_extra)
> > +                             stack_depth_extra = max(stack_depth_extra, stack_extra);
>
> I don't think that using "max" here makes much sense.
> If several sorts of kfuncs would need extra stack space,
> most likely this space won't be shared,

I see. I was assuming that the space will be shared.
Like in the follow up I was planning to roll optimize_bpf_loop()
into this. And optimize_bpf_loop() will be sharing the scratch area.
But now, I'm not sure what I was thinking, since bpf_can_loop()
cannot share that space with optimize_bpf_loop().

> e.g. bpf_can_loop() would need
> it's 8 bytes + bpf_some_new_feature() would need 16 bytes on top of that etc.
> So some kind of flags indicating space for which features is reacquired is needed here.

Yeah. It needs some way to indicate shared stack_extra vs exclusive.
Will fix in v2.