[PATCH bpf-next v1 1/2] bpf: verify scalar ids mapping in regsafe() using check_ids()

Eduard Zingerman <eddyz87@xxxxxxxxx> · Fri, 26 May 2023 21:41:25 +0300

Make sure that the following unsafe example is rejected by verifier:

1: r9 = ... some pointer with range X ...
2: r6 = ... unbound scalar ID=a ...
3: r7 = ... unbound scalar ID=b ...
4: if (r6 > r7) goto +1
5: r6 = r7
6: if (r6 > X) goto ...
--- checkpoint ---
7: r9 += r7
8: *(u64 *)r9 = Y

This example is unsafe because not all execution paths verify r7 range.
Because of the jump at (4) the verifier would arrive at (6) in two states:
I.  r6{.id=b}, r7{.id=b} via path 1-6;
II. r6{.id=a}, r7{.id=b} via path 1-4, 6.

Currently regsafe() does not call check_ids() for scalar registers,
thus from POV of regsafe() states (I) and (II) are identical. If the
path 1-6 is taken by verifier first, and checkpoint is created at (6)
the path [1-4, 6] would be considered safe.

This commit updates regsafe() to call check_ids() for scalar registers.

The change in check_alu_op() to avoid assigning scalar id to constants
is performance optimization. W/o it the regsafe() change becomes
costly for some programs, e.g. for
tools/testing/selftests/bpf/progs/pyperf600.c the difference is:

File             Program   States (A)  States (B)  States    (DIFF)
---------------  --------  ----------  ----------  ----------------
pyperf600.bpf.o  on_event       22200       37060  +14860 (+66.94%)

Where A -- this patch,
      B -- this patch but w/o check_alu_op() changes.

Fixes: 75748837b7e5 ("bpf: Propagate scalar ranges through register assignments.")
Signed-off-by: Eduard Zingerman <eddyz87@xxxxxxxxx>
---
 kernel/bpf/verifier.c | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index af70dad655ab..624556eda430 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12806,10 +12806,12 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 				/* case: R1 = R2
 				 * copy register state to dest reg
 				 */
-				if (src_reg->type == SCALAR_VALUE && !src_reg->id)
+				if (src_reg->type == SCALAR_VALUE && !src_reg->id &&
+				    !tnum_is_const(src_reg->var_off))
 					/* Assign src and dst registers the same ID
 					 * that will be used by find_equal_scalars()
 					 * to propagate min/max range.
+					 * Skip constants to avoid allocation of useless ID.
 					 */
 					src_reg->id = ++env->id_gen;
 				copy_register_state(dst_reg, src_reg);
@@ -15151,6 +15153,33 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
 
 	switch (base_type(rold->type)) {
 	case SCALAR_VALUE:
+		/* Why check_ids() for precise registers?
+		 *
+		 * Consider the following BPF code:
+		 *   1: r6 = ... unbound scalar, ID=a ...
+		 *   2: r7 = ... unbound scalar, ID=b ...
+		 *   3: if (r6 > r7) goto +1
+		 *   4: r6 = r7
+		 *   5: if (r6 > X) goto ...
+		 *   6: ... memory operation using r7 ...
+		 *
+		 * First verification path is [1-6]:
+		 * - at (4) same bpf_reg_state::id (b) would be assigned to r6 and r7;
+		 * - at (5) r6 would be marked <= X, find_equal_scalars() would also mark
+		 *   r7 <= X, because r6 and r7 share same id.
+		 *
+		 * Next verification path would start from (5), because of the jump at (3).
+		 * The only state difference between first and second visits of (5) is
+		 * bpf_reg_state::id assignments for r6 and r7: (b, b) vs (a, b).
+		 * Thus, use check_ids() to distinguish these states.
+		 *
+		 * The `rold->precise` check is a performance optimization. If `rold->id`
+		 * was ever used to access memory / predict jump, the `rold` or any
+		 * register used in `rold = r?` / `r? = rold` operations would be marked
+		 * as precise, otherwise it's ID is not really interesting.
+		 */
+		if (rold->precise && rold->id && !check_ids(rold->id, rcur->id, idmap))
+			return false;
 		if (regs_exact(rold, rcur, idmap))
 			return true;
 		if (env->explore_alu_limits)
-- 
2.40.1