On Tue, Sep 19, 2023 at 9:28 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > It looks like I hit a related but slightly different bug with bpf_iter_next(). > Consider the following example: > > SEC("fentry/" SYS_PREFIX "sys_nanosleep") > int num_iter_bug(const void *ctx) { > struct bpf_iter_num it; // fp[-8] below > __u64 val = 0; // fp[-16] in the below > __u64 *ptr = &val; // r7 below > __u64 rnd = bpf_get_current_pid_tgid(); // r6 below > void *v; > > bpf_iter_num_new(&it, 0, 10); > while ((v = bpf_iter_num_next(&it))) { > rnd++; > if (rnd == 42) { > ptr = (void*)(0xdead); > continue; > } > bpf_probe_read_user(ptr, 8, (void*)(0xdeadbeef)); > } > bpf_iter_num_destroy(&it); > return 0; > } > > (Unfortunately, it had to be converted to assembly to avoid compiler > clobbering loop structure, complete test case is at the end of the email). > > The example is not safe because of 0xdead being a possible `ptr` value. > However, currently it is marked as safe. > > This happens because of states_equal() usage for iterator convergence > detection: > > static int is_state_visited(struct bpf_verifier_env *env, int insn_idx) > ... > while (sl) > states_cnt++; > if (sl->state.insn_idx != insn_idx) > goto next; > > if (sl->state.branches) > ... > if (is_iter_next_insn(env, insn_idx)) { > if (states_equal(env, &sl->state, cur)) { > ... > if (iter_state->iter.state == BPF_ITER_STATE_ACTIVE) > goto hit; > } > goto skip_inf_loop_check; > } > ... > > With some additional logging I see that the following states are > considered equal: > > 13: (85) call bpf_iter_num_next#59908 > ... > at is_iter_next_insn(insn 13): > old state: > R0=scalar() R1_rw=fp-8 R6_r=scalar(id=1) R7=fp-16 R10=fp0 > fp-8_r=iter_num(ref_id=2,state=active,depth=0) fp-16=00000000 refs=2 > cur state: > R0=rdonly_mem(id=3,ref_obj_id=2,off=0,imm=0) R1_w=fp-8 R6=42 R7_w=57005 > R10=fp0 fp-8=iter_num(ref_id=2,state=active,depth=1) fp-16=00000000 refs=2 > states_equal()?: true > > Note that R7=fp-16 in old state vs R7_w=57005 in cur state. > The registers are considered equal because R7 does not have a read mark. > However read marks are not yet finalized for old state because > sl->state.branches != 0. (Note: precision marks are not finalized as > well, which should be a problem, but this requires another example). I originally convinced myself that non-finalized precision marking is fine, see the comment next to process_iter_next_call() for reasoning. If you can have a dedicated example for precision that would be great. As for read marks... Yep, that's a bummer. That r7 is not used after the loop, so is not marked as read, and it's basically ignored for loop invariant checking, which is definitely not what was intended. > > A possible fix is to add a special flag to states_equal() and > conditionally ignore logic related to liveness and precision when this > flag is set. Set this flag for is_iter_next_insn() branch above. Probably not completely ignore liveness (and maybe precision, but let's do it one step at a time), but only at least one of the registers or stack slots are marked as written or read in one of old/new states? Basically, if some register/slot at the end of iteration is read or modified, it must be important for loop invariant, so even if the parent state says it's not read, we still treat it as read. Can you please try that with just read/write marks and see if anything fails? > > --- > > /* BTF FUNC records are not generated for kfuncs referenced > * from inline assembly. These records are necessary for > * libbpf to link the program. The function below is a hack > * to ensure that BTF FUNC records are generated. > */ > void __kfunc_btf_root(void) > { > bpf_iter_num_new(0, 0, 0); > bpf_iter_num_next(0); > bpf_iter_num_destroy(0); > } > > SEC("fentry/" SYS_PREFIX "sys_nanosleep") > __naked int num_iter_bug(const void *ctx) > { > asm volatile ( > // r7 = &fp[-16] > // fp[-16] = 0 > "r7 = r10;" > "r7 += -16;" > "r0 = 0;" > "*(u64 *)(r7 + 0) = r0;" > // r6 = bpf_get_current_pid_tgid() > "call %[bpf_get_current_pid_tgid];" > "r6 = r0;" > // bpf_iter_num_new(&fp[-8], 0, 10) > "r1 = r10;" > "r1 += -8;" > "r2 = 0;" > "r3 = 10;" > "call %[bpf_iter_num_new];" > // while (bpf_iter_num_next(&fp[-8])) { > // r6++ > // if (r6 != 42) { > // r7 = 0xdead > // continue; > // } > // bpf_probe_read_user(r7, 8, 0xdeadbeef) > // } > "1:" > "r1 = r10;" > "r1 += -8;" > "call %[bpf_iter_num_next];" > "if r0 == 0 goto 2f;" > "r6 += 1;" > "if r6 != 42 goto 3f;" > "r7 = 0xdead;" > "goto 1b;" > "3:" > "r1 = r7;" > "r2 = 8;" > "r3 = 0xdeadbeef;" > "call %[bpf_probe_read_user];" > "goto 1b;" > "2:" > // bpf_iter_num_destroy(&fp[-8]) > "r1 = r10;" > "r1 += -8;" > "call %[bpf_iter_num_destroy];" > // return 0 > "r0 = 0;" > "exit;" > : > : __imm(bpf_get_current_pid_tgid), > __imm(bpf_iter_num_new), > __imm(bpf_iter_num_next), > __imm(bpf_iter_num_destroy), > __imm(bpf_probe_read_user) > : __clobber_all > ); > }