On Thu, 2022-12-01 at 03:14 +0200, Eduard Zingerman wrote: > On Wed, 2022-11-30 at 16:26 -0800, Andrii Nakryiko wrote: > > On Mon, Nov 28, 2022 at 8:35 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > > > > > Prior to this commit the following unsafe example passed verification: > > > > > > 1: r9 = ... some pointer with range X ... > > > 2: r6 = ... unbound scalar ID=a ... > > > 3: r7 = ... unbound scalar ID=b ... > > > 4: if (r6 > r7) goto +1 > > > 5: r6 = r7 > > > 6: if (r6 > X) goto ... ; <-- suppose checkpoint state is created here > > > 7: r9 += r7 > > > 8: *(u64 *)r9 = Y > > > > > > This example is unsafe because not all execution paths verify r7 range. > > > Because of the jump at (4) the verifier would arrive at (6) in two states: > > > I. r6{.id=b}, r7{.id=b} via path 1-6; > > > II. r6{.id=a}, r7{.id=b} via path 1-4, 6. > > > > > > Currently regsafe() does not call check_ids() for scalar registers, > > > thus from POV of regsafe() states (I) and (II) are identical. If the > > > path 1-6 is taken by verifier first and checkpoint is created at (6) > > > the path 1-4, 6 would be considered safe. > > > > > > This commit makes the following changes: > > > - a call to check_ids() is added in regsafe() for scalar registers case; > > > - a function mark_equal_scalars_as_read() is added to ensure that > > > registers with identical IDs are preserved in the checkpoint states > > > in case when find_equal_scalars() updates register range for several > > > registers sharing the same ID. > > > > > > > Fixes tag missing? > > > > These are tricky changes with subtle details. Let's split check_ids() > > change and all the liveness manipulations into separate patches? They > > are conceptually completely independent, right? > > > > > > > Signed-off-by: Eduard Zingerman <eddyz87@xxxxxxxxx> > > > --- > > > kernel/bpf/verifier.c | 87 ++++++++++++++++++++++++++++++++++++++++++- > > > 1 file changed, 85 insertions(+), 2 deletions(-) > > > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > > index 6599d25dae38..8a5b7192514a 100644 > > > --- a/kernel/bpf/verifier.c > > > +++ b/kernel/bpf/verifier.c > > > @@ -10638,10 +10638,12 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn) > > > /* case: R1 = R2 > > > * copy register state to dest reg > > > */ > > > - if (src_reg->type == SCALAR_VALUE && !src_reg->id) > > > + if (src_reg->type == SCALAR_VALUE && !src_reg->id && > > > + !tnum_is_const(src_reg->var_off)) > > > /* Assign src and dst registers the same ID > > > * that will be used by find_equal_scalars() > > > * to propagate min/max range. > > > + * Skip constants to avoid allocation of useless ID. > > > */ > > > src_reg->id = ++env->id_gen; > > > *dst_reg = *src_reg; > > > @@ -11446,16 +11448,86 @@ static bool try_match_pkt_pointers(const struct bpf_insn *insn, > > > return true; > > > } > > > > > > +/* Scalar ID generation in check_alu_op() and logic of > > > + * find_equal_scalars() make the following pattern possible: > > > + * > > > + * 1: r9 = ... some pointer with range X ... > > > + * 2: r6 = ... unbound scalar ID=a ... > > > + * 3: r7 = ... unbound scalar ID=b ... > > > + * 4: if (r6 > r7) goto +1 > > > + * 5: r6 = r7 > > > + * 6: if (r6 > X) goto ... ; <-- suppose checkpoint state is created here > > > + * 7: r9 += r7 > > > + * 8: *(u64 *)r9 = Y > > > + * > > > + * Because of the jump at (4) the verifier would arrive at (6) in two states: > > > + * I. r6{.id=b}, r7{.id=b} > > > + * II. r6{.id=a}, r7{.id=b} > > > + * > > > + * Relevant facts: > > > + * - regsafe() matches ID mappings for scalars using check_ids(), this makes > > > + * states (I) and (II) non-equal; > > > + * - clean_func_state() removes registers not marked as REG_LIVE_READ from > > > + * checkpoint states; > > > + * - mark_reg_read() modifies reg->live for reg->parent (and it's parents); > > > + * - when r6 = r7 is process the bpf_reg_state is copied in full, meaning > > > + * that parent pointers are copied as well. > > > > not too familiar with liveness handling, but is this correct and > > expected? Should this be fixed instead of REG_LIVE_READ manipulations? TLDR: - looks like it is safe to avoid bpf_reg_state->parent update when registers are copied, but it has some performance impact. If this impact is acceptable I'd like to move this way are remove mark_equal_scalars_as_read(). - unrelated question: does anything has to be done to __mark_chain_precision to make it know about shared scalar IDs? E.g. in the following case: ... --- checkpoint 1 --- r6 would be marked as precise here r6 = r7 ... --- checkpoint 2 --- r6 won't be marked as precise here ... if r6 < 10 --- checkpoint 3 --- fp[r7] = 42 The additional precision marks could be inferred if additional info is added the jump stack. ----- Long version about bpf_reg_state->parent. All functions that copy register states: - save_register_state() Register is copied to stack spill location. - check_kfunc_mem_size_reg() Makes a tmp register copy, harmless. - sanitize_ptr_alu() Register copy is visible in the speculative branch. - check_alu_op() Register is copied when reg to reg MOV is processed. - find_equal_scalars() Register is copied to the registers with the same ID. Original commit introducing bpf_reg_state->parent: - 679c782de14bd48c19dd74cd1af20a2bc05dd936 2018-08-30 bpf/verifier: per-register parent pointers Updates mark_reg_read() and removes mark_stack_slot_read(). Previous version accessed parent state registers directly using regno w/o any additional manipulations. In this commit: - save_register_state() - does not exist yet, its function is performed by check_stack_write(), has a direct register state copy as in save_register_state(); - check_kfunc_mem_size_reg() - does not exist yet; - sanitize_ptr_alu() - does not exist yet; - check_alu_op() - has a direct register state copy unchanged for reg to reg MOV. Commit introducing find_equal_scalars(): - 75748837b7e56919679e02163f45d5818c644d03 2020-10-09 bpf: Propagate scalar ranges through register assignments. Has a direct register copy *reg = *known_reg. It looks like that for places where registers are copied ->parent change is an unintentional side effect. Removal of this side effect might in theory lead to some additional register read marks, e.g. in this case: r6 = r7 if (r6 > X) goto +42 r5 = r7 ; this would lead to a read mark on r7 not ; present previously. This should not hinder correctness. However, there is some negative performance impact: $ ./veristat -e file,prog,states -f 'states_pct!=0' -C master-baseline.log current.log File Program States (A) States (B) States (DIFF) ------------------------ -------------------------------- ---------- ---------- -------------- bpf_host.o cil_to_netdev 358 455 +97 (+27.09%) bpf_host.o tail_handle_nat_fwd_ipv4 1746 1891 +145 (+8.30%) bpf_host.o tail_handle_nat_fwd_ipv6 709 717 +8 (+1.13%) bpf_host.o tail_nodeport_ipv4_dsr 31 42 +11 (+35.48%) bpf_host.o tail_nodeport_nat_egress_ipv4 2269 2274 +5 (+0.22%) bpf_host.o tail_nodeport_nat_ingress_ipv4 276 316 +40 (+14.49%) bpf_host.o tail_nodeport_nat_ingress_ipv6 243 254 +11 (+4.53%) bpf_lxc.o tail_handle_nat_fwd_ipv4 1746 1891 +145 (+8.30%) bpf_lxc.o tail_handle_nat_fwd_ipv6 709 717 +8 (+1.13%) bpf_lxc.o tail_ipv4_ct_egress 248 251 +3 (+1.21%) bpf_lxc.o tail_ipv4_ct_ingress 248 251 +3 (+1.21%) bpf_lxc.o tail_ipv4_ct_ingress_policy_only 248 251 +3 (+1.21%) bpf_lxc.o tail_nodeport_ipv4_dsr 31 42 +11 (+35.48%) bpf_lxc.o tail_nodeport_nat_ingress_ipv4 276 316 +40 (+14.49%) bpf_lxc.o tail_nodeport_nat_ingress_ipv6 243 254 +11 (+4.53%) bpf_overlay.o tail_handle_nat_fwd_ipv4 1082 1109 +27 (+2.50%) bpf_overlay.o tail_nodeport_ipv4_dsr 31 42 +11 (+35.48%) bpf_overlay.o tail_nodeport_nat_egress_ipv4 2238 2243 +5 (+0.22%) bpf_overlay.o tail_nodeport_nat_ingress_ipv4 276 316 +40 (+14.49%) bpf_overlay.o tail_nodeport_nat_ingress_ipv6 243 254 +11 (+4.53%) bpf_sock.o cil_sock4_connect 47 64 +17 (+36.17%) bpf_sock.o cil_sock4_sendmsg 45 62 +17 (+37.78%) bpf_xdp.o tail_handle_nat_fwd_ipv4 1461 1912 +451 (+30.87%) bpf_xdp.o tail_lb_ipv4 4643 4738 +95 (+2.05%) bpf_xdp.o tail_nodeport_nat_egress_ipv4 1066 1069 +3 (+0.28%) bpf_xdp.o tail_rev_nodeport_lb4 404 411 +7 (+1.73%) bpf_xdp.o tail_rev_nodeport_lb6 1076 1083 +7 (+0.65%) pyperf600_bpf_loop.bpf.o on_event 285 287 +2 (+0.70%) xdp_synproxy_kern.bpf.o syncookie_tc 22513 22564 +51 (+0.23%) xdp_synproxy_kern.bpf.o syncookie_xdp 22207 24206 +1999 (+9.00%) ------------------------ -------------------------------- ---------- ---------- -------------- > > Well, that's what I wanted to ask, actually :) > Here is how current logic works: > - is_state_visited() has the following two loops in the end: > > for (j = 0; j <= cur->curframe; j++) { > for (i = j < cur->curframe ? BPF_REG_6 : 0; i < BPF_REG_FP; i++) > cur->frame[j]->regs[i].parent = &new->frame[j]->regs[i]; > for (i = 0; i < BPF_REG_FP; i++) > cur->frame[j]->regs[i].live = REG_LIVE_NONE; > } > > /* all stack frames are accessible from callee, clear them all */ > for (j = 0; j <= cur->curframe; j++) { > struct bpf_func_state *frame = cur->frame[j]; > struct bpf_func_state *newframe = new->frame[j]; > > for (i = 0; i < frame->allocated_stack / BPF_REG_SIZE; i++) { > frame->stack[i].spilled_ptr.live = REG_LIVE_NONE; > frame->stack[i].spilled_ptr.parent = > &newframe->stack[i].spilled_ptr; > } > } > > These connect the bpf_reg_state members of the new state with > corresponding (index-wise) members of the parent state. > - find_equal_scalars() looks as follows: > static void find_equal_scalars(struct bpf_verifier_state *vstate, > struct bpf_reg_state *known_reg) > { > struct bpf_func_state *state; > struct bpf_reg_state *reg; > > bpf_for_each_reg_in_vstate(vstate, state, reg, ({ > if (reg->type == SCALAR_VALUE && reg->id == known_reg->id) > *reg = *known_reg; // <--- full copy, incl. parent pointer > })); > } > - mark_reg_read() updates the ->live field of the *parent* register > when called only if ->live field of the *current* register is not > marked as written. > - in case if register is overwritten it's ->live field is marked as > written, e.g. see check_stack_read_fixed_off(). > > Suppose we have an example: > > ---- checkpoint ---- > r1 = r0 ; now r1.parent == &checkpoint->regs[0] > r2 = r1 ; now r2.parent == &checkpoint->regs[0] > if (r1 == 0) goto +42 > ... > > Given the above logic only &checkpoint->regs[0] would receive read > marks. Although I'm not the original author but this behavior seem to > make sense. > > > > > > + * > > > + * Thus, for execution path 1-6: > > > + * - both r6->parent and r7->parent point to the same register in the parent state (r7); > > > + * - only *one* register in the checkpoint state would receive REG_LIVE_READ mark; > > > > I'm trying to understand this. Clearly both r6 and r7 are read. r6 for > > if (r6 > X) check, r7 for r9 manipulations. Why do we end up not > > marking one of them as read using a normal logic? > > When (r6 > X) is processed find_equal_scalars() updates parent > pointers for all registers with the same ID as r6, in this case only > for r7. So, after find_equal_scalars() is done both current state r6 > and r7 ->parent point to the r6 of the latest checkpoint state. > > > > > I have this bad feeling I'm missing something very important here or > > we have some bug somewhere else. So please help me understand which > > one it is. This special liveness manipulation seems wrong. > > > > My concern is that if I have some code like > > > > r6 = r7; > > r9 += r6; > > > > and I never use r7 anymore after that, then we should be able to > > forget r7 and treat it as NOT_INIT. But you are saying it's unsafe > > right now and that doesn't make much sense to me. > > It is unsafe because of the "spooky action at a distance" produced by > a combination of: > - allocation of scalar IDs for moves, see check_alu_op() case for > 64-bit move; > - find_equal_scalars() that propagates range, this one is only > executed for conditional jumps. > > > > > > > > + * - clean_func_state() would remove r6 from checkpoint state (mark it NOT_INIT). > > > + * > > > + * Consequently, when execution path 1-4, 6 reaches (6) in state (II) > > > + * regsafe() won't be able to see a mismatch in ID mappings. > > > + * > > > + * To avoid this issue mark_equal_scalars_as_read() conservatively > > > + * marks all registers with matching ID as REG_LIVE_READ, thus > > > + * preserving r6 and r7 in the checkpoint state for the example above. > > > + */ > > > +static void mark_equal_scalars_as_read(struct bpf_verifier_state *vstate, int id) > > > +{ > > > + struct bpf_verifier_state *st; > > > + struct bpf_func_state *state; > > > + struct bpf_reg_state *reg; > > > + bool move_up; > > > + int i = 0; > > > + > > > + for (st = vstate, move_up = true; st && move_up; st = st->parent) { > > > + move_up = false; > > > + bpf_for_each_reg_in_vstate(st, state, reg, ({ > > > + if (reg->type == SCALAR_VALUE && reg->id == id && > > > + !(reg->live & REG_LIVE_READ)) { > > > + reg->live |= REG_LIVE_READ; > > > + move_up = true; > > > + } > > > + ++i; > > > + })); > > > + } > > > +} > > > + > > > static void find_equal_scalars(struct bpf_verifier_state *vstate, > > > struct bpf_reg_state *known_reg) > > > { > > > struct bpf_func_state *state; > > > struct bpf_reg_state *reg; > > > + int count = 0; > > > > > > bpf_for_each_reg_in_vstate(vstate, state, reg, ({ > > > - if (reg->type == SCALAR_VALUE && reg->id == known_reg->id) > > > + if (reg->type == SCALAR_VALUE && reg->id == known_reg->id) { > > > *reg = *known_reg; > > > + ++count; > > > + } > > > })); > > > + > > > + /* Count equal to 1 means that find_equal_scalars have not > > > + * found any registers with the same ID (except self), thus > > > + * the range knowledge have not been transferred and there is > > > + * no need to preserve registers with the same ID in a parent > > > + * state. > > > + */ > > > + if (count > 1) > > > + mark_equal_scalars_as_read(vstate->parent, known_reg->id); > > > } > > > > > > static int check_cond_jmp_op(struct bpf_verifier_env *env, > > > @@ -12878,6 +12950,12 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold, > > > */ > > > return equal && rold->frameno == rcur->frameno; > > > > > > + /* even if two registers are identical the id mapping might diverge > > > + * e.g. rold{.id=1}, rcur{.id=1}, idmap{1->2} > > > + */ > > > + if (equal && rold->type == SCALAR_VALUE && rold->id) > > > + return check_ids(rold->id, rcur->id, idmap); > > > > nit: let's teach check_ids() to handle the id == 0 case properly > > instead of guarding everything with `if (rold->id)`? > > > > but also I think this applies not just to SCALARs, right? the memcmp() > > check above has to be augmented with check_ids() for id and ref_obj_id > > Yes, it is the same issue as described in [1] as you pointed out. > I'll updated it for other branches, but I want the main issue to > be sorted out first. > > [1] https://lore.kernel.org/bpf/CAEf4BzbFB5g4oUfyxk9rHy-PJSLQ3h8q9mV=rVoXfr_JVm8+1Q@xxxxxxxxxxxxxx/ > > > > > > + > > > if (equal) > > > return true; > > > > > > @@ -12891,6 +12969,11 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold, > > > if (env->explore_alu_limits) > > > return false; > > > if (rcur->type == SCALAR_VALUE) { > > > + /* id relations must be preserved, see comment in > > > + * mark_equal_scalars_as_read() for SCALAR_VALUE example. > > > + */ > > > + if (rold->id && !check_ids(rold->id, rcur->id, idmap)) > > > + return false; > > > if (!rold->precise) > > > return true; > > > /* new val must satisfy old val knowledge */ > > > -- > > > 2.34.1 > > > >