On Fri, Jul 5, 2024 at 1:59 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > Use bpf_verifier_state->jmp_history to track which registers were > updated by find_equal_scalars() when conditional jump was verified. > Use recorded information in backtrack_insn() to propagate precision. > > E.g. for the following program: > > while verifying instructions > r1 = r0 | > if r1 < 8 goto ... | push r0,r1 as equal_scalars in jmp_history > if r0 > 16 goto ... | push r0,r1 as equal_scalars in jmp_history linked_scalars? especially now that Alexei added offsets between linked registers > r2 = r10 | > r2 += r0 v mark_chain_precision(r0) > > while doing mark_chain_precision(r0) > r1 = r0 ^ > if r1 < 8 goto ... | mark r0,r1 as precise > if r0 > 16 goto ... | mark r0,r1 as precise > r2 = r10 | > r2 += r0 | mark r0 precise let's reverse the order here so it's linear in how the algorithm actually works (backwards)? > > Technically, achieve this as follows: > - Use 10 bits to identify each register that gains range because of > find_equal_scalars(): should this be renamed to find_linked_scalars() nowadays? > - 3 bits for frame number; > - 6 bits for register or stack slot number; > - 1 bit to indicate if register is spilled. > - Use u64 as a vector of 6 such records + 4 bits for vector length. > - Augment struct bpf_jmp_history_entry with field 'linked_regs' > representing such vector. > - When doing check_cond_jmp_op() remember up to 6 registers that > gain range because of find_equal_scalars() in such a vector. > - Don't propagate range information and reset IDs for registers that > don't fit in 6-value vector. > - Push a pair {instruction index, equal scalars vector} > to bpf_verifier_state->jmp_history. > - When doing backtrack_insn() check if any of recorded linked > registers is currently marked precise, if so mark all linked > registers as precise. > > This also requires fixes for two test_verifier tests: > - precise: test 1 > - precise: test 2 > > Both tests contain the following instruction sequence: > > 19: (bf) r2 = r9 ; R2=scalar(id=3) R9=scalar(id=3) > 20: (a5) if r2 < 0x8 goto pc+1 ; R2=scalar(id=3,umin=8) > 21: (95) exit > 22: (07) r2 += 1 ; R2_w=scalar(id=3+1,...) > 23: (bf) r1 = r10 ; R1_w=fp0 R10=fp0 > 24: (07) r1 += -8 ; R1_w=fp-8 > 25: (b7) r3 = 0 ; R3_w=0 > 26: (85) call bpf_probe_read_kernel#113 > > The call to bpf_probe_read_kernel() at (26) forces r2 to be precise. > Previously, this forced all registers with same id to become precise > immediately when mark_chain_precision() is called. > After this change, the precision is propagated to registers sharing > same id only when 'if' instruction is backtracked. > Hence verification log for both tests is changed: > regs=r2,r9 -> regs=r2 for instructions 25..20. > > Fixes: 904e6ddf4133 ("bpf: Use scalar ids in mark_chain_precision()") > Reported-by: Hao Sun <sunhao.th@xxxxxxxxx> > Closes: https://lore.kernel.org/bpf/CAEf4BzZ0xidVCqB47XnkXcNhkPWF6_nTV7yt+_Lf0kcFEut2Mg@xxxxxxxxxxxxxx/ > Suggested-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > Signed-off-by: Eduard Zingerman <eddyz87@xxxxxxxxx> > --- > include/linux/bpf_verifier.h | 4 + > kernel/bpf/verifier.c | 231 ++++++++++++++++-- > .../bpf/progs/verifier_subprog_precision.c | 2 +- > .../testing/selftests/bpf/verifier/precise.c | 20 +- > 4 files changed, 232 insertions(+), 25 deletions(-) > The logic looks good (though I had a few small questions), I think. > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h > index 2b54e25d2364..da450552c278 100644 > --- a/include/linux/bpf_verifier.h > +++ b/include/linux/bpf_verifier.h > @@ -371,6 +371,10 @@ struct bpf_jmp_history_entry { > u32 prev_idx : 22; > /* special flags, e.g., whether insn is doing register stack spill/load */ > u32 flags : 10; > + /* additional registers that need precision tracking when this > + * jump is backtracked, vector of six 10-bit records > + */ > + u64 linked_regs; > }; > > /* Maximum number of register states that can exist at once */ > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index e25ad5fb9115..ec493360607e 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -3335,9 +3335,87 @@ static bool is_jmp_point(struct bpf_verifier_env *env, int insn_idx) > return env->insn_aux_data[insn_idx].jmp_point; > } > > +#define ES_FRAMENO_BITS 3 > +#define ES_SPI_BITS 6 > +#define ES_ENTRY_BITS (ES_SPI_BITS + ES_FRAMENO_BITS + 1) > +#define ES_SIZE_BITS 4 > +#define ES_FRAMENO_MASK ((1ul << ES_FRAMENO_BITS) - 1) > +#define ES_SPI_MASK ((1ul << ES_SPI_BITS) - 1) > +#define ES_SIZE_MASK ((1ul << ES_SIZE_BITS) - 1) ull for 32-bit arches? > +#define ES_SPI_OFF ES_FRAMENO_BITS > +#define ES_IS_REG_OFF (ES_SPI_BITS + ES_FRAMENO_BITS) ES makes no sense now, no? LR or LINKREG or something along those lines? > +#define LINKED_REGS_MAX 6 > + > +struct reg_or_spill { reg_or_spill -> linked_reg ? > + u8 frameno:3; > + union { > + u8 spi:6; > + u8 regno:6; > + }; > + bool is_reg:1; > +}; Do we need these bitfields for unpacked representation? It's going to use 2 bytes for this struct anyways. If you just use u8 for everything you end up with 3 bytes. Bitfields are a bit slower because the compiler will need to do more bit manipulations, so is it really worth it? > + > +struct linked_regs { > + int cnt; > + struct reg_or_spill entries[LINKED_REGS_MAX]; > +}; > + [...] > @@ -3615,6 +3739,12 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx, > print_bpf_insn(&cbs, insn, env->allow_ptr_leaks); > } > > + /* If there is a history record that some registers gained range at this insn, > + * propagate precision marks to those registers, so that bt_is_reg_set() > + * accounts for these registers. > + */ > + bt_sync_linked_regs(bt, hist); > + > if (class == BPF_ALU || class == BPF_ALU64) { > if (!bt_is_reg_set(bt, dreg)) > return 0; > @@ -3844,6 +3974,7 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx, > */ > bt_set_reg(bt, dreg); > bt_set_reg(bt, sreg); > + } else if (BPF_SRC(insn->code) == BPF_K) { > /* else dreg <cond> K drop "else" from the comment then? I like this change. > * Only dreg still needs precision before > * this insn, so for the K-based conditional > @@ -3862,6 +3993,10 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx, > /* to be analyzed */ > return -ENOTSUPP; > } > + /* Propagate precision marks to linked registers, to account for > + * registers marked as precise in this function. > + */ > + bt_sync_linked_regs(bt, hist); Radical Andrii is fine with this, though I wonder if there is some place outside of backtrack_insn() where the first bt_sync_linked_regs() could be called just once? But regardless, this is only mildly expensive when we do have linked registers, so unlikely to have any noticeable performance effect. > return 0; > } > > @@ -4624,7 +4759,7 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env, > } > > if (insn_flags) > - return push_jmp_history(env, env->cur_state, insn_flags); > + return push_jmp_history(env, env->cur_state, insn_flags, 0); > return 0; > } > > @@ -4929,7 +5064,7 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env, > insn_flags = 0; /* we are not restoring spilled register */ > } > if (insn_flags) > - return push_jmp_history(env, env->cur_state, insn_flags); > + return push_jmp_history(env, env->cur_state, insn_flags, 0); > return 0; > } > > @@ -15154,14 +15289,66 @@ static bool try_match_pkt_pointers(const struct bpf_insn *insn, > return true; > } > > -static void find_equal_scalars(struct bpf_verifier_state *vstate, > - struct bpf_reg_state *known_reg) > +static void __find_equal_scalars(struct linked_regs *reg_set, struct bpf_reg_state *reg, > + u32 id, u32 frameno, u32 spi_or_reg, bool is_reg) we should abandon "equal scalars" terminology, they don't have to be equal, they are just linked together (potentially with a fixed difference between them) how about "collect_linked_regs"? > +{ > + struct reg_or_spill *e; > + > + if (reg->type != SCALAR_VALUE || (reg->id & ~BPF_ADD_CONST) != id) THIS is actually the place where I'd use u32 id:31; + bool is_linked_reg:1; just so that it's not so easy to accidentally forget about BPF_ADD_CONST flag (but it's unrelated to your patch) > + return; > + > + e = linked_regs_push(reg_set); > + if (e) { > + e->frameno = frameno; > + e->is_reg = is_reg; > + e->regno = spi_or_reg; > + } else { > + reg->id = 0; > + } > +} > + [...] > @@ -15312,6 +15500,21 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env, > return 0; > } > > + /* Push scalar registers sharing same ID to jump history, > + * do this before creating 'other_branch', so that both > + * 'this_branch' and 'other_branch' share this history > + * if parent state is created. > + */ > + if (BPF_SRC(insn->code) == BPF_X && src_reg->type == SCALAR_VALUE && src_reg->id) > + find_equal_scalars(this_branch, src_reg->id, &linked_regs); > + if (dst_reg->type == SCALAR_VALUE && dst_reg->id) > + find_equal_scalars(this_branch, dst_reg->id, &linked_regs); > + if (linked_regs.cnt > 1) { if we have just one, should it be even marked as linked? > + err = push_jmp_history(env, this_branch, 0, linked_regs_pack(&linked_regs)); > + if (err) > + return err; > + } > + > other_branch = push_stack(env, *insn_idx + insn->off + 1, *insn_idx, > false); > if (!other_branch) > @@ -15336,13 +15539,13 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env, > if (BPF_SRC(insn->code) == BPF_X && > src_reg->type == SCALAR_VALUE && src_reg->id && > !WARN_ON_ONCE(src_reg->id != other_branch_regs[insn->src_reg].id)) { > - find_equal_scalars(this_branch, src_reg); > - find_equal_scalars(other_branch, &other_branch_regs[insn->src_reg]); > + copy_known_reg(this_branch, src_reg, &linked_regs); > + copy_known_reg(other_branch, &other_branch_regs[insn->src_reg], &linked_regs); I liked the "sync" terminology you used for bt, so why not call this "sync_linked_regs" ? > } > if (dst_reg->type == SCALAR_VALUE && dst_reg->id && > !WARN_ON_ONCE(dst_reg->id != other_branch_regs[insn->dst_reg].id)) { > - find_equal_scalars(this_branch, dst_reg); > - find_equal_scalars(other_branch, &other_branch_regs[insn->dst_reg]); > + copy_known_reg(this_branch, dst_reg, &linked_regs); > + copy_known_reg(other_branch, &other_branch_regs[insn->dst_reg], &linked_regs); > } > [...]