Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 20 Sep 2023 09:57:13 -0700

On Wed, Sep 20, 2023 at 9:20 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Wed, 2023-09-20 at 03:19 +0300, Eduard Zingerman wrote:
> [...]
> > Liveness is a precondition for all subsequent checks, so example
> > involving precision would also involve liveness. Here is a combined
> > example:
> >
> >     r8 = 0
> >     fp[-16] = 0
> >     r7 = -16
> >     r6 = bpf_get_current_pid_tgid()
> >     bpf_iter_num_new(&fp[-8], 0, 10)
> >     while (bpf_iter_num_next(&fp[-8])) {
> >       r6++
> >       if (r6 != 42) {
> >         r7 = -32
> >         continue;
> >       }
> >       r0 = r10
> >       r0 += r7
> >       r8 = *(u64 *)(r0 + 0)
> >     }
> >     bpf_iter_num_destroy(&fp[-8])
> >     return r8
> >
> > (Complete source code is at the end of the email).
> >
> > The call to bpf_iter_num_next() is reachable with r7 values -16 and -32.
> > State with r7=-16 is visited first, at which point r7 has no read mark
> > and is not marked precise.
> > State with r7=-32 is visited second:
> > - states_equal() for is_iter_next_insn() should ignore absence of
> >   REG_LIVE_READ mark on r7, otherwise both states would be considered
> >   identical;
> > - states_equal() for is_iter_next_insn() should ignore absence of
> >   precision mark on r7, otherwise both states would be considered
> >   identical.
> >
> > > > A possible fix is to add a special flag to states_equal() and
> > > > conditionally ignore logic related to liveness and precision when this
> > > > flag is set. Set this flag for is_iter_next_insn() branch above.
> > >
> > > Probably not completely ignore liveness (and maybe precision, but
> > > let's do it one step at a time), but only at least one of the
> > > registers or stack slots are marked as written or read in one of
> > > old/new states? Basically, if some register/slot at the end of
> > > iteration is read or modified, it must be important for loop
> > > invariant, so even if the parent state says it's not read, we still
> > > treat it as read. Can you please try that with just read/write marks
> > > and see if anything fails?
> >
> > Unfortunately for the example above neither liveness nor precision
> > marks are present when states are compared, e.g. here is the
> > (extended) log:
> >
> > 12: (85) call bpf_iter_num_next#52356
> > ...
> > is_iter_next (12):
> >   old:
> >      R0=scalar() R1_rw=fp-8 R6_r=scalar(id=1) R7=-16 R8_r=0 R10=fp0
> >      fp-8_r=iter_num(ref_id=2,state=active,depth=0) fp-16=00000000 refs=2
> >   new:
> >      R0=rdonly_mem(id=3,ref_obj_id=2,off=0,imm=0) R1_w=fp-8 R6=42 R7_w=-32 R8=0 R10=fp0
> >      fp-8=iter_num(ref_id=2,state=active,depth=1) fp-16=00000000 refs=2
> > > hit
>
> It might be the case that I miss-read situation a bit, because there
> is also R6 which is actually used to pick the branch. And it has read
> mark but does not have precision mark, because jump was not predicted.
>
> In general, if we hit an old state with .branches > 0:
> - either there is an execution path "old -> safe exit";
> - or there is a loop (not necessarily because of iterator) and old is
>   a [grand]parent of the "cur" state.
>
> In either case, "cur" state is safe to prune if:
> - it is possible to show that it would take same jump branches as
>   "old" state;
> - and iteration depth differs.
>
> All registers that took part in conditional jumps should already have
> read marks in old state. However, not all jumps could be predicted,
> thus not all such registers are marked as precise.
>
> Unfortunately, treating read marks as precision marks is not sufficient.
> For the particular example above:
>
>     old: ... R6_r=scalar(id=1) ...
>     cur: ... R6=42 ...
>
> Here range_within() check in regsafe() would return true and old and
> cur would still be considered identical:
>
>     static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
>                 struct bpf_reg_state *rcur, struct bpf_idmap *idmap)
>     {
>         ...
>         switch (base_type(rold->type)) {
>         case SCALAR_VALUE:
>             ...
>             if (!rold->precise)
>                 return true;
>             ...
>             return range_within(rold, rcur) &&
>                    tnum_in(rold->var_off, rcur->var_off) &&
>                    check_scalar_ids(rold->id, rcur->id, idmap);
>         case ...
>         }
>         ...
>     }
>
> So it appears that there is no need to ignore liveness checks after all,
> but I don't see a way to make non-exact scalar comparisons at the moment.

Yeah, me neither. We can try something drastic like marking all
registers in the state as precise at the beginning of
iteration/callback execution. But I suspect that will pessimize
verification a lot and might start rejecting valid cases.

Don't know, I'm still trying to think of something that would work.

>
> > Note that for old state r7 value is not really important to get to exit,
> > the verification trace looks as follows:
> > 1. ... start ...
> > 2. bpf_iter_num_next(&fp[-8]) ;; r7=-16, checkpoint saved
> > 3. r6++
> > 4. r7 = -32
> > 5. bpf_iter_num_next(&fp[-8]) ;; push new state with r7=-32
> > 6. bpf_iter_num_destroy(&fp[-8])
> > 7. return 8
> >
> > The r7 liveness and precision is important for state scheduled at (5)
> > but it is not known yet, so when both states are compared marks are
> > absent in cur and in old.
> >
> > ---
> >
> > /* BTF FUNC records are not generated for kfuncs referenced
> >  * from inline assembly. These records are necessary for
> >  * libbpf to link the program. The function below is a hack
> >  * to ensure that BTF FUNC records are generated.
> >  */
> > void __kfunc_btf_root(void)
> > {
> >       bpf_iter_num_new(0, 0, 0);
> >       bpf_iter_num_next(0);
> >       bpf_iter_num_destroy(0);
> > }
> >
> > SEC("fentry/" SYS_PREFIX "sys_nanosleep")
> > __failure
> > __msg("20: (79) r8 = *(u64 *)(r0 +0)")
> > __msg("invalid read from stack R0 off=-32 size=8")
> > __naked int iter_next_exact(const void *ctx)
> > {
> >       asm volatile (
> >               "r8 = 0;"
> >               "*(u64 *)(r10 - 16) = r8;"
> >               "r7 = -16;"
> >               "call %[bpf_get_current_pid_tgid];"
> >               "r6 = r0;"
> >               "r1 = r10;"
> >               "r1 += -8;"
> >               "r2 = 0;"
> >               "r3 = 10;"
> >               "call %[bpf_iter_num_new];"
> >       "1:"
> >               "r1 = r10;"
> >               "r1 += -8;\n"
> >               "call %[bpf_iter_num_next];"
> >               "if r0 == 0 goto 2f;"
> >               "r6 += 1;"
> >               "if r6 != 42 goto 3f;"
> >               "r7 = -32;"
> >               "goto 1b;\n"
> >       "3:"
> >               "r0 = r10;"
> >               "r0 += r7;"
> >               "r8 = *(u64 *)(r0 + 0);"
> >               "goto 1b;\n"
> >       "2:"
> >               "r1 = r10;"
> >               "r1 += -8;"
> >               "call %[bpf_iter_num_destroy];"
> >               "r0 = r8;"
> >               "exit;"
> >               :
> >               : __imm(bpf_get_current_pid_tgid),
> >                 __imm(bpf_iter_num_new),
> >                 __imm(bpf_iter_num_next),
> >                 __imm(bpf_iter_num_destroy),
> >                 __imm(bpf_probe_read_user)
> >               : __clobber_all
> >       );
> > }
> >
>