Re: [PATCH bpf-next 1/2] bpf: Introduce bpf_can_loop() kfunc

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Tue, 27 Feb 2024 22:30:46 -0800

On Fri, Feb 23, 2024 at 5:55 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Fri, Feb 23, 2024 at 4:50 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
> >
> > On Fri, 2024-02-23 at 16:22 -0800, Alexei Starovoitov wrote:
> > [...]
> >
> > > I think you're missing the point.
> > > It's not about this particular list iterator.
> > > It's about _all_ for(), while() loops.
> > > I've started converting lib/radix-tree.c to bpf and arena.
> > > There are hundreds of various loops that need to be converted.
> > > The best is to copy-paste them as-is and add bpf_can_loop() to loop
> > > condition. That's it.
> > > Otherwise explicit iterators are changing the code significantly
> > > and distract from the logic of the algorithm.
> > >
> > > Another key point is the last sentence of the commit log:
> > > "New instruction with the same semantics can be added, so that LLVM
> > > can generate it."
> > >
> > > This is the way to implement __builtin_memcpy, __builtin_strcmp
> > > and friends in llvm and gcc.
> >
> > There are two things that usage of bpf_can_loop() provides:
> > 1. A proof that BPF program would terminate at runtime.
> > 2. A way for verifier to terminate verification process
> >    (by stopping processing some path when two verifier states are exactly equal).
> >
> > The (1) is iffy, because there are simple ways to forgo it in practical terms.
> > E.g. for the program below it would be possible to make 64 * 10^12 iterations
> > at runtime:
> >
> >     void bar(...) {
> >       while (... && bpf_can_loop())
> >         ... do something ...;
> >     }
> >
> >     void foo(...) {
> >       while (... && bpf_can_loop())
> >         bar();
> >     }
>
> so ?
> bpf_loop() helper and open coded iterators can do the same already.
> It's something we need to fix regardless.
>
> (1) is not iffy. The program will terminate. That's a 100% guarantee.
>
> > If we decide that for some programs it is not necessary to enforce
> > proof of runtime termination, then it would be possible to untie (2)
> > from iterators and just check if looping state is states_equal(... exact=true)
> > to some previous one.
>
> No. That's not at all the same.
> Looping and eventually exiting is a strong guarantee by
> the verifier and the users know that all paths to exit are explored.
> Just "looping is ok" without exit guarantee
> means that a bunch of code may not be visited by the verifier.
> Arguably dead code elimination should kick in,
> but I don't think it's a territory we can go to.
>
> >
> > [...]
> >
> > > > > @@ -7954,10 +7956,14 @@ static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx,
> > > > >       struct bpf_reg_state *cur_iter, *queued_iter;
> > > > >       int iter_frameno = meta->iter.frameno;
> > > > >       int iter_spi = meta->iter.spi;
> > > > > +     bool is_can_loop = is_can_loop_kfunc(meta);
> > > > >
> > > > >       BTF_TYPE_EMIT(struct bpf_iter);
> > > > >
> > > > > -     cur_iter = &env->cur_state->frame[iter_frameno]->stack[iter_spi].spilled_ptr;
> > > > > +     if (is_can_loop)
> > > > > +             cur_iter = &cur_st->can_loop_reg;
> > > > > +     else
> > > > > +             cur_iter = &cur_st->frame[iter_frameno]->stack[iter_spi].spilled_ptr;
> > > >
> > > > I think that adding of a utility function hiding this choice, e.g.:
> > > >
> > > >     get_iter_reg(struct bpf_verifier_state *st, int insn_idx)
> > > >
> > > > would simplify the code a bit, here and in is_state_visited().
> > >
> > > Hmm. That sounds like obfuscation, since 'meta' would need to be passed in,
> > > but is_state_visited() doesn't have meta.
> > > Create fake meta there?!
> > >
> > > I'm missing how such get_iter_reg() helper will look.
> > > meta->iter.frameno was populated by process_iter_arg().
> > > Far away from process_iter_next_call().
> >
> > I meant that this helper can peek spi from R1 just like code in
> > is_state_visited() does currently. Forgoing the 'meta' completely.
>
> I see.
> You mean removing:
>                 meta->iter.spi = spi;
>                 meta->iter.frameno = reg->frameno;
> from process_iter_arg() and
> 'meta' arg from process_iter_next_call() as well then ?

Ed,

That was a bad idea.
I tried what you suggested with
static struct bpf_reg_state *get_iter_reg(struct bpf_verifier_env *env,
                                          struct bpf_verifier_state
*st, int insn_idx)

and implemented in v2.
It's buggy as can be seen in CI (I sloppy tested it before sending it
yesterday).
I'm going to go back to v1 approach.
process_iter_next_call() _has_ to use meta,
since caller saved regs already cleared by the time it's called.
And doing fake 'meta' in is_state_visited() is not a good idea.
It took me a few hours to debug this :(