Re: [BUG] verifier escape with iteration helpers (bpf_loop, ...)

Eduard Zingerman <eddyz87@xxxxxxxxx> · Wed, 20 Sep 2023 03:19:48 +0300

On Tue, 2023-09-19 at 16:02 -0700, Andrii Nakryiko wrote:
> On Tue, Sep 19, 2023 at 9:28 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
> > 
> > It looks like I hit a related but slightly different bug with bpf_iter_next().
> > Consider the following example:
> > 
> >     SEC("fentry/" SYS_PREFIX "sys_nanosleep")
> >     int num_iter_bug(const void *ctx) {
> >         struct bpf_iter_num it;                 // fp[-8] below
> >         __u64 val = 0;                          // fp[-16] in the below
> >         __u64 *ptr = &val;                      // r7 below
> >         __u64 rnd = bpf_get_current_pid_tgid(); // r6 below
> >         void *v;
> > 
> >         bpf_iter_num_new(&it, 0, 10);
> >         while ((v = bpf_iter_num_next(&it))) {
> >             rnd++;
> >             if (rnd == 42) {
> >                 ptr = (void*)(0xdead);
> >                 continue;
> >             }
> >             bpf_probe_read_user(ptr, 8, (void*)(0xdeadbeef));
> >         }
> >         bpf_iter_num_destroy(&it);
> >         return 0;
> >     }
> > 
> > (Unfortunately, it had to be converted to assembly to avoid compiler
> >  clobbering loop structure, complete test case is at the end of the email).
> > 
> > The example is not safe because of 0xdead being a possible `ptr` value.
> > However, currently it is marked as safe.
> > 
> > This happens because of states_equal() usage for iterator convergence
> > detection:
> > 
> >     static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
> >         ...
> >         while (sl)
> >                 states_cnt++;
> >                 if (sl->state.insn_idx != insn_idx)
> >                         goto next;
> > 
> >                 if (sl->state.branches)
> >                 ...
> >                         if (is_iter_next_insn(env, insn_idx)) {
> >                                 if (states_equal(env, &sl->state, cur)) {
> >                         ...
> >                                         if (iter_state->iter.state == BPF_ITER_STATE_ACTIVE)
> >                                                 goto hit;
> >                                 }
> >                                 goto skip_inf_loop_check;
> >                         }
> >         ...
> > 
> > With some additional logging I see that the following states are
> > considered equal:
> > 
> >     13: (85) call bpf_iter_num_next#59908
> >     ...
> >     at is_iter_next_insn(insn 13):
> >       old state:
> >          R0=scalar() R1_rw=fp-8 R6_r=scalar(id=1) R7=fp-16 R10=fp0
> >          fp-8_r=iter_num(ref_id=2,state=active,depth=0) fp-16=00000000 refs=2
> >       cur state:
> >          R0=rdonly_mem(id=3,ref_obj_id=2,off=0,imm=0) R1_w=fp-8 R6=42 R7_w=57005
> >          R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) fp-16=00000000 refs=2
> >     states_equal()?: true
> > 
> > Note that R7=fp-16 in old state vs R7_w=57005 in cur state.
> > The registers are considered equal because R7 does not have a read mark.
> > However read marks are not yet finalized for old state because
> > sl->state.branches != 0. (Note: precision marks are not finalized as
> > well, which should be a problem, but this requires another example).
> 
> I originally convinced myself that non-finalized precision marking is
> fine, see the comment next to process_iter_next_call() for reasoning.
> If you can have a dedicated example for precision that would be great.
> 
> As for read marks... Yep, that's a bummer. That r7 is not used after
> the loop, so is not marked as read, and it's basically ignored for
> loop invariant checking, which is definitely not what was intended.

Liveness is a precondition for all subsequent checks, so example
involving precision would also involve liveness. Here is a combined
example:

    r8 = 0
    fp[-16] = 0
    r7 = -16
    r6 = bpf_get_current_pid_tgid()
    bpf_iter_num_new(&fp[-8], 0, 10)
    while (bpf_iter_num_next(&fp[-8])) {
      r6++
      if (r6 != 42) {
        r7 = -32
        continue;
      }
      r0 = r10
      r0 += r7
      r8 = *(u64 *)(r0 + 0)
    }
    bpf_iter_num_destroy(&fp[-8])
    return r8

(Complete source code is at the end of the email).

The call to bpf_iter_num_next() is reachable with r7 values -16 and -32.
State with r7=-16 is visited first, at which point r7 has no read mark
and is not marked precise.
State with r7=-32 is visited second:
- states_equal() for is_iter_next_insn() should ignore absence of
  REG_LIVE_READ mark on r7, otherwise both states would be considered
  identical;
- states_equal() for is_iter_next_insn() should ignore absence of
  precision mark on r7, otherwise both states would be considered
  identical.

> > A possible fix is to add a special flag to states_equal() and
> > conditionally ignore logic related to liveness and precision when this
> > flag is set. Set this flag for is_iter_next_insn() branch above.
> 
> Probably not completely ignore liveness (and maybe precision, but
> let's do it one step at a time), but only at least one of the
> registers or stack slots are marked as written or read in one of
> old/new states? Basically, if some register/slot at the end of
> iteration is read or modified, it must be important for loop
> invariant, so even if the parent state says it's not read, we still
> treat it as read. Can you please try that with just read/write marks
> and see if anything fails?

Unfortunately for the example above neither liveness nor precision
marks are present when states are compared, e.g. here is the
(extended) log:

12: (85) call bpf_iter_num_next#52356
...
is_iter_next (12):
  old:
     R0=scalar() R1_rw=fp-8 R6_r=scalar(id=1) R7=-16 R8_r=0 R10=fp0
     fp-8_r=iter_num(ref_id=2,state=active,depth=0) fp-16=00000000 refs=2
  new:
     R0=rdonly_mem(id=3,ref_obj_id=2,off=0,imm=0) R1_w=fp-8 R6=42 R7_w=-32 R8=0 R10=fp0
     fp-8=iter_num(ref_id=2,state=active,depth=1) fp-16=00000000 refs=2
> hit

Note that for old state r7 value is not really important to get to exit,
the verification trace looks as follows:
1. ... start ...
2. bpf_iter_num_next(&fp[-8]) ;; r7=-16, checkpoint saved
3. r6++
4. r7 = -32
5. bpf_iter_num_next(&fp[-8]) ;; push new state with r7=-32
6. bpf_iter_num_destroy(&fp[-8])
7. return 8

The r7 liveness and precision is important for state scheduled at (5)
but it is not known yet, so when both states are compared marks are
absent in cur and in old.

---

/* BTF FUNC records are not generated for kfuncs referenced
 * from inline assembly. These records are necessary for
 * libbpf to link the program. The function below is a hack
 * to ensure that BTF FUNC records are generated.
 */
void __kfunc_btf_root(void)
{
	bpf_iter_num_new(0, 0, 0);
	bpf_iter_num_next(0);
	bpf_iter_num_destroy(0);
}

SEC("fentry/" SYS_PREFIX "sys_nanosleep")
__failure
__msg("20: (79) r8 = *(u64 *)(r0 +0)")
__msg("invalid read from stack R0 off=-32 size=8")
__naked int iter_next_exact(const void *ctx)
{
	asm volatile (
		"r8 = 0;"
		"*(u64 *)(r10 - 16) = r8;"
		"r7 = -16;"
		"call %[bpf_get_current_pid_tgid];"
		"r6 = r0;"
		"r1 = r10;"
		"r1 += -8;"
		"r2 = 0;"
		"r3 = 10;"
		"call %[bpf_iter_num_new];"
	"1:"
		"r1 = r10;"
		"r1 += -8;\n"
		"call %[bpf_iter_num_next];"
		"if r0 == 0 goto 2f;"
		"r6 += 1;"
		"if r6 != 42 goto 3f;"
		"r7 = -32;"
		"goto 1b;\n"
	"3:"
		"r0 = r10;"
		"r0 += r7;"
		"r8 = *(u64 *)(r0 + 0);"
		"goto 1b;\n"
	"2:"
		"r1 = r10;"
		"r1 += -8;"
		"call %[bpf_iter_num_destroy];"
		"r0 = r8;"
		"exit;"
		:
		: __imm(bpf_get_current_pid_tgid),
		  __imm(bpf_iter_num_new),
		  __imm(bpf_iter_num_next),
		  __imm(bpf_iter_num_destroy),
		  __imm(bpf_probe_read_user)
		: __clobber_all
	);
}