Re: [PATCH bpf-next v3 3/4] bpf: verify scalar ids mapping in regsafe() using check_ids()

Eduard Zingerman <eddyz87@xxxxxxxxx> · Thu, 08 Jun 2023 22:05:19 +0300

On Thu, 2023-06-08 at 10:21 -0700, Andrii Nakryiko wrote:
> On Wed, Jun 7, 2023 at 6:26 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
> > 
> > On Wed, 2023-06-07 at 14:40 -0700, Andrii Nakryiko wrote:
> > > On Tue, Jun 6, 2023 at 3:24 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
> > > > 
> > > > Make sure that the following unsafe example is rejected by verifier:
> > > > 
> > > > 1: r9 = ... some pointer with range X ...
> > > > 2: r6 = ... unbound scalar ID=a ...
> > > > 3: r7 = ... unbound scalar ID=b ...
> > > > 4: if (r6 > r7) goto +1
> > > > 5: r6 = r7
> > > > 6: if (r6 > X) goto ...
> > > > --- checkpoint ---
> > > > 7: r9 += r7
> > > > 8: *(u64 *)r9 = Y
> > > > 
> > > > This example is unsafe because not all execution paths verify r7 range.
> > > > Because of the jump at (4) the verifier would arrive at (6) in two states:
> > > > I.  r6{.id=b}, r7{.id=b} via path 1-6;
> > > > II. r6{.id=a}, r7{.id=b} via path 1-4, 6.
> > > > 
> > > > Currently regsafe() does not call check_ids() for scalar registers,
> > > > thus from POV of regsafe() states (I) and (II) are identical. If the
> > > > path 1-6 is taken by verifier first, and checkpoint is created at (6)
> > > > the path [1-4, 6] would be considered safe.
> > > > 
> > > > This commit updates regsafe() to call check_ids() for precise scalar
> > > > registers.
> > > > 
> > > > To minimize the impact on verification performance, avoid generating
> > > > bpf_reg_state::id for constant scalar values when processing BPF_MOV
> > > > in check_alu_op(). Scalar IDs are utilized by find_equal_scalars() to
> > > > propagate information about value ranges for registers that hold the
> > > > same value. However, there is no need to propagate range information
> > > > for constants.
> > > > 
> > > > Still, there is some performance impact because of this change.
> > > > Using veristat to compare number of processed states for selftests
> > > > object files listed in tools/testing/selftests/bpf/veristat.cfg and
> > > > Cilium object files from [1] gives the following statistics:
> > > > 
> > > > $ ./veristat -e file,prog,states -f "states_pct>10" \
> > > >     -C master-baseline.log current.log
> > > > File         Program                         States  (DIFF)
> > > > -----------  ------------------------------  --------------
> > > > bpf_xdp.o    tail_handle_nat_fwd_ipv6        +155 (+23.92%)
> > > > bpf_xdp.o    tail_nodeport_nat_ingress_ipv4  +102 (+27.20%)
> > > > bpf_xdp.o    tail_rev_nodeport_lb4            +83 (+20.85%)
> > > > loop6.bpf.o  trace_virtqueue_add_sgs          +25 (+11.06%)
> > > > 
> > > > Also test case verifier_search_pruning/allocated_stack has to be
> > > > updated to avoid conflicts in register ID assignments between cached
> > > > and new states.
> > > > 
> > > > [1] git@xxxxxxxxxx:anakryiko/cilium.git
> > > > 
> > > > Fixes: 75748837b7e5 ("bpf: Propagate scalar ranges through register assignments.")
> > > > Signed-off-by: Eduard Zingerman <eddyz87@xxxxxxxxx>
> > > > ---
> > > 
> > > So I checked it also on our internal BPF object files, and it looks
> > > mostly good. Here are the only regressions:
> > > 
> > > Program                                   States (A)  States (B)
> > > States   (DIFF)
> > > ----------------------------------------  ----------  ----------
> > > ---------------
> > > balancer_ingress                               29219       34531
> > > +5312 (+18.18%)
> > > syar_bind6_protect6                             3257        3599
> > > +342 (+10.50%)
> > > syar_bind4_protect4                             2590        2931
> > > +341 (+13.17%)
> > > on_alloc                                         415         526
> > > +111 (+26.75%)
> > > on_free                                          406         517
> > > +111 (+27.34%)
> > > pycallcount                                      395         506
> > > +111 (+28.10%)
> > > resume_context                                   405         516
> > > +111 (+27.41%)
> > > on_py_event                                      395         506
> > > +111 (+28.10%)
> > > on_event                                         284         394
> > > +110 (+38.73%)
> > > handle_cuda_event                                268         378
> > > +110 (+41.04%)
> > > handle_cuda_launch                               276         386
> > > +110 (+39.86%)
> > > handle_cuda_malloc_ret                           272         382
> > > +110 (+40.44%)
> > > handle_cuda_memcpy                               270         380
> > > +110 (+40.74%)
> > > handle_cuda_memcpy_async                         270         380
> > > +110 (+40.74%)
> > > handle_pytorch_allocate_ret                      271         381
> > > +110 (+40.59%)
> > > handle_pytorch_malloc_ret                        272         382
> > > +110 (+40.44%)
> > > on_event                                         284         394
> > > +110 (+38.73%)
> > > on_event                                         284         394
> > > +110 (+38.73%)
> > > syar_task_enter_execve                           309         329
> > > +20 (+6.47%)
> > > kprobe__security_inode_link                      968         986
> > > +18 (+1.86%)
> > > kprobe__security_inode_symlink                   838         854
> > > +16 (+1.91%)
> > > tw_twfw_egress                                   249         251
> > > +2 (+0.80%)
> > > tw_twfw_ingress                                  250         252
> > > +2 (+0.80%)
> > > tw_twfw_tc_eg                                    248         250
> > > +2 (+0.81%)
> > > tw_twfw_tc_in                                    250         252
> > > +2 (+0.80%)
> > > raw_tracepoint__sched_process_exec               136         139
> > > +3 (+2.21%)
> > > kprobe_ret__do_filp_open                         869         871
> > > +2 (+0.23%)
> > > read_erlang_stack                                572         573
> > > +1 (+0.17%)
> > > 
> > > 
> > > They are mostly on small-ish programs. The only mild concern from my
> > > side is balancer_ingress, which is one of Katran BPF programs. It add
> > > +18% of states (which translates to about 70K more instructions
> > > verified, up from 350K). I think we can live with this, but would be
> > > nice to check why it's happening.
> > 
> > Thank you for reviewing this series.
> > 
> > I looked at the logs that you've shared, the difference is indeed
> > caused by some scalar registers having a unique ID in cached state and
> > no ID in current state or vice versa. The !rold->id trick that we
> > discussed for V2 helps :)
> > 
> > What do you think about an alternative way to exclude unique scalars
> > as in the patch below? (on top of this patch-set):
> > 
> > --- 8< -------------------------
> > 
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 235d7eded565..ece9722dff3b 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -15149,6 +15149,13 @@ static bool check_ids(u32 old_id, u32 cur_id, struct bpf_id_pair *idmap)
> >         return false;
> >  }
> > 
> > +static bool check_scalar_ids(u32 old_id, u32 cur_id, struct bpf_verifier_env *env)
> > +{
> > +       old_id = old_id ? old_id : env->id_gen++;
> > +       cur_id = cur_id ? cur_id : env->id_gen++;
> > +       return check_ids(old_id, cur_id, env->idmap_scratch);
> > +}
> > +
> >  static void clean_func_state(struct bpf_verifier_env *env,
> >                              struct bpf_func_state *st)
> >  {
> > @@ -15325,7 +15332,7 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
> >                  */
> >                 return range_within(rold, rcur) &&
> >                        tnum_in(rold->var_off, rcur->var_off) &&
> > -                      check_ids(rold->id, rcur->id, idmap);
> > +                      check_scalar_ids(rold->id, rcur->id, env);
> >         case PTR_TO_MAP_KEY:
> >         case PTR_TO_MAP_VALUE:
> >         case PTR_TO_MEM:
> > 
> > ------------------------- >8 ---
> > 
> > For me this patch removes all veristat differences compared to the
> > master. If doing it for real, I'd like to reset env->id_gen at exit
> > from states_equal() to the value it had at entry (to avoid allocating
> > too many ids).
> 
> Hm.. It's clever and pretty minimal, I like it. We are basically
> allocating virtual ID for SCALAR that doesn't have id, just to make
> sure we get a conflict if the SCALAR with ID cannot be mapped into two
> different SCALARs, right?
> 
> The only question would be if it's safe to do that for case when
> old_reg->id != 0 and cur_reg->id == 0? E.g., if in old (verified)
> state we have r6.id = r7.id = 123, and in new state we have r6.id = 0
> and r7.id = 0, then your implementation above will say that states are
> equivalent. But are they, given there is a link between r6 and r7 that
> might be required for correctness. Which we don't have in current
> state.

You mean the other way around, rold.id == 0, rcur.id != 0, right?
(below 0/2 means: original value 0, replaced by new id 2).

(1)   old cur
r6.id 0/2   1
r7.id 0/3   1 check_ids returns true

(2)   old cur
r6.id 1   0/2
r7.id 1   0/3 check_ids returns false

Also, (1) is no different from (3):

(3)   old cur
r6.id 1     3
r7.id 2     3 check_ids returns true

Current check:

		if (!rold->precise)
			return true;
		return range_within(rold, rcur) &&
		       tnum_in(rold->var_off, rcur->var_off) &&
		       check_ids(rold->id, rcur->id, idmap);

Will reject (1) and (2), but will accept (3).

New check:

		if (!rold->precise)
			return true;
		return range_within(rold, rcur) &&
		       tnum_in(rold->var_off, rcur->var_off) &&
		       check_scalar_ids(rold->id, rcur->id, idmap);

Will reject (2), but will accept (1) and (3).

And modification of check_scalar_ids() to generate IDs only for rold
or only for rcur would not reject (3) either.

(3) would be rejected only if current check is used together with
elimination of unique scalar IDs from old states.

My previous experiments show that eliminating unique IDs from old
states and not eliminating unique IDs from cur states is *very* bad
for performance.

> 
> So with this we are getting to my original concerns with your
> !rold->id approach, which tries to ignore the necessity of link
> between registers.
> 
> What if we do this only for old registers? Then, (in old state) r6.id
> = 0, r7.id = 0, (in new state) r6.id = r7.id = 123. This will be
> rejected because first we'll map 123 to newly allocated X for r6.id,
> and then when we try to match r7.id=123 to another allocated ID X+1
> we'll get a conflict, right?
> 
> > 
> > > 
> > > I suspect that dropping SCALAR IDs as we discussed (after fixing
> > > register fill/spill ID generation) might completely mitigate that.
> > > 
> > > Overall, LGTM:
> > > 
> > > Acked-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> > > 
> > > >  kernel/bpf/verifier.c                         | 34 ++++++++++++++++---
> > > >  .../bpf/progs/verifier_search_pruning.c       |  3 +-
> > > >  2 files changed, 32 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > index 2aa60b73f1b5..175ca22b868e 100644
> > > > --- a/kernel/bpf/verifier.c
> > > > +++ b/kernel/bpf/verifier.c
> > > > @@ -12933,12 +12933,14 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
> > > >                 if (BPF_SRC(insn->code) == BPF_X) {
> > > >                         struct bpf_reg_state *src_reg = regs + insn->src_reg;
> > > >                         struct bpf_reg_state *dst_reg = regs + insn->dst_reg;
> > > > +                       bool need_id = (src_reg->type == SCALAR_VALUE && !src_reg->id &&
> > > > +                                       !tnum_is_const(src_reg->var_off));
> > > > 
> > > 
> > > nit: unnecessary outer ()
> > > 
> > > >                         if (BPF_CLASS(insn->code) == BPF_ALU64) {
> > > >                                 /* case: R1 = R2
> > > >                                  * copy register state to dest reg
> > > >                                  */
> > > > -                               if (src_reg->type == SCALAR_VALUE && !src_reg->id)
> > > > +                               if (need_id)
> > > >                                         /* Assign src and dst registers the same ID
> > > >                                          * that will be used by find_equal_scalars()
> > > >                                          * to propagate min/max range.
> > > 
> > > [...]
> >