Re: [PATCH bpf-next v1 1/2] bpf: verify scalar ids mapping in regsafe() using check_ids()

Eduard Zingerman <eddyz87@xxxxxxxxx> · Sat, 27 May 2023 15:29:53 +0300

On Sat, 2023-05-27 at 15:21 +0300, Eduard Zingerman wrote:
[...]
> > > @@ -15151,6 +15153,33 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
> > >   
> > >   	switch (base_type(rold->type)) {
> > >   	case SCALAR_VALUE:
> > > +		/* Why check_ids() for precise registers?
> > > +		 *
> > > +		 * Consider the following BPF code:
> > > +		 *   1: r6 = ... unbound scalar, ID=a ...
> > > +		 *   2: r7 = ... unbound scalar, ID=b ...
> > > +		 *   3: if (r6 > r7) goto +1
> > > +		 *   4: r6 = r7
> > > +		 *   5: if (r6 > X) goto ...
> > > +		 *   6: ... memory operation using r7 ...
> > > +		 *
> > > +		 * First verification path is [1-6]:
> > > +		 * - at (4) same bpf_reg_state::id (b) would be assigned to r6 and r7;
> > > +		 * - at (5) r6 would be marked <= X, find_equal_scalars() would also mark
> > > +		 *   r7 <= X, because r6 and r7 share same id.
> > > +		 *
> > > +		 * Next verification path would start from (5), because of the jump at (3).
> > > +		 * The only state difference between first and second visits of (5) is
> > > +		 * bpf_reg_state::id assignments for r6 and r7: (b, b) vs (a, b).
> > > +		 * Thus, use check_ids() to distinguish these states.
> > > +		 *
> > > +		 * The `rold->precise` check is a performance optimization. If `rold->id`
> > > +		 * was ever used to access memory / predict jump, the `rold` or any
> > > +		 * register used in `rold = r?` / `r? = rold` operations would be marked
> > > +		 * as precise, otherwise it's ID is not really interesting.
> > > +		 */
> > > +		if (rold->precise && rold->id && !check_ids(rold->id, rcur->id, idmap))
> > 
> > Do we need rold->id checking in the above? check_ids should have 
> > rold->id = 0 properly. Or this is just an optimization?
> 
> You are correct, the check_ids() handles this case and it should be inlined,
> so there is no need to check rold->id in this 'if' branch.
>  
> > regs_exact() has check_ids as well. Not sure whether it makes sense to
> > create a function regs_exact_scalar() just for scalar and include the
> > above code. Otherwise, it is strange we do check_ids in different
> > places.
> 
> I'm not sure how to best re-organize code here, regs_exact() is a nice
> compartmentalized abstraction. It is possible to merge my additional
> check_ids() call with the main 'precise' processing part as below:
> 
> @@ -15152,21 +15154,22 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
>         switch (base_type(rold->type)) {
>         case SCALAR_VALUE:
>                 if (regs_exact(rold, rcur, idmap))
>                         return true;
>                 if (env->explore_alu_limits)
>                         return false;
>                 if (!rold->precise)
>                         return true;
>                 /* new val must satisfy old val knowledge */
>                 return range_within(rold, rcur) &&
> -                      tnum_in(rold->var_off, rcur->var_off);
> +                      tnum_in(rold->var_off, rcur->var_off) &&
> +                      check_ids(rold->id, rcur->id, idmap);
> 
> I'd say that extending /* new val must satisfy ... */ comment to
> explain why check_ids() is needed should be sufficient, but I'm open
> for suggestions.

On the other hand, I wanted to have a separate 'if' branch like:

  if (rold->precise && !check_ids(rold->id, rcur->id, idmap))

Specifically to explain that 'rold->precise' part is an optimization.

> 
> > 
> > > +			return false;
> > >   		if (regs_exact(rold, rcur, idmap))
> > >   			return true;
> > >   		if (env->explore_alu_limits)
>