Re: [PATCH bpf-next v1 1/2] bpf: verify scalar ids mapping in regsafe() using check_ids()

Yonghong Song <yhs@xxxxxxxx> · Sat, 27 May 2023 16:43:42 -0700






On 5/27/23 5:29 AM, Eduard Zingerman wrote:
On Sat, 2023-05-27 at 15:21 +0300, Eduard Zingerman wrote:
[...]
@@ -15151,6 +15153,33 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
   
   	switch (base_type(rold->type)) {
   	case SCALAR_VALUE:
+		/* Why check_ids() for precise registers?
+		 *
+		 * Consider the following BPF code:
+		 *   1: r6 = ... unbound scalar, ID=a ...
+		 *   2: r7 = ... unbound scalar, ID=b ...
+		 *   3: if (r6 > r7) goto +1
+		 *   4: r6 = r7
+		 *   5: if (r6 > X) goto ...
+		 *   6: ... memory operation using r7 ...
+		 *
+		 * First verification path is [1-6]:
+		 * - at (4) same bpf_reg_state::id (b) would be assigned to r6 and r7;
+		 * - at (5) r6 would be marked <= X, find_equal_scalars() would also mark
+		 *   r7 <= X, because r6 and r7 share same id.
+		 *
+		 * Next verification path would start from (5), because of the jump at (3).
+		 * The only state difference between first and second visits of (5) is
+		 * bpf_reg_state::id assignments for r6 and r7: (b, b) vs (a, b).
+		 * Thus, use check_ids() to distinguish these states.
+		 *
+		 * The `rold->precise` check is a performance optimization. If `rold->id`
+		 * was ever used to access memory / predict jump, the `rold` or any
+		 * register used in `rold = r?` / `r? = rold` operations would be marked
+		 * as precise, otherwise it's ID is not really interesting.
+		 */
+		if (rold->precise && rold->id && !check_ids(rold->id, rcur->id, idmap))

Do we need rold->id checking in the above? check_ids should have
rold->id = 0 properly. Or this is just an optimization?

You are correct, the check_ids() handles this case and it should be inlined,
so there is no need to check rold->id in this 'if' branch.
  
regs_exact() has check_ids as well. Not sure whether it makes sense to
create a function regs_exact_scalar() just for scalar and include the
above code. Otherwise, it is strange we do check_ids in different
places.

I'm not sure how to best re-organize code here, regs_exact() is a nice
compartmentalized abstraction. It is possible to merge my additional
check_ids() call with the main 'precise' processing part as below:

@@ -15152,21 +15154,22 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
         switch (base_type(rold->type)) {
         case SCALAR_VALUE:
                 if (regs_exact(rold, rcur, idmap))
                         return true;
                 if (env->explore_alu_limits)
                         return false;
                 if (!rold->precise)
                         return true;
                 /* new val must satisfy old val knowledge */
                 return range_within(rold, rcur) &&
-                      tnum_in(rold->var_off, rcur->var_off);
+                      tnum_in(rold->var_off, rcur->var_off) &&
+                      check_ids(rold->id, rcur->id, idmap);

I'd say that extending /* new val must satisfy ... */ comment to
explain why check_ids() is needed should be sufficient, but I'm open
for suggestions.

On the other hand, I wanted to have a separate 'if' branch like:

   if (rold->precise && !check_ids(rold->id, rcur->id, idmap))
   
Specifically to explain that 'rold->precise' part is an optimization.

Okay, I think you could keep your original implementation. I do think
checking rold->ref_obj_id in regs_exact is not needed for
SCALAR_VALUE but it may not be that important. The check_ids
checking in reg_exact (for SCALAR_VALUE) can also be skipped
if !rold->precise as an optimization. That is why I mention
to 'inline' regs_exact and re-arrange the codes. You can still
mention that optimization w.r.t. rold->precise. Overall if the code
is more complex, I am okay with your current change.




+			return false;
   		if (regs_exact(rold, rcur, idmap))
   			return true;
   		if (env->explore_alu_limits)