On 5/27/23 5:29 AM, Eduard Zingerman wrote:
On Sat, 2023-05-27 at 15:21 +0300, Eduard Zingerman wrote:
[...]
@@ -15151,6 +15153,33 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
switch (base_type(rold->type)) {
case SCALAR_VALUE:
+ /* Why check_ids() for precise registers?
+ *
+ * Consider the following BPF code:
+ * 1: r6 = ... unbound scalar, ID=a ...
+ * 2: r7 = ... unbound scalar, ID=b ...
+ * 3: if (r6 > r7) goto +1
+ * 4: r6 = r7
+ * 5: if (r6 > X) goto ...
+ * 6: ... memory operation using r7 ...
+ *
+ * First verification path is [1-6]:
+ * - at (4) same bpf_reg_state::id (b) would be assigned to r6 and r7;
+ * - at (5) r6 would be marked <= X, find_equal_scalars() would also mark
+ * r7 <= X, because r6 and r7 share same id.
+ *
+ * Next verification path would start from (5), because of the jump at (3).
+ * The only state difference between first and second visits of (5) is
+ * bpf_reg_state::id assignments for r6 and r7: (b, b) vs (a, b).
+ * Thus, use check_ids() to distinguish these states.
+ *
+ * The `rold->precise` check is a performance optimization. If `rold->id`
+ * was ever used to access memory / predict jump, the `rold` or any
+ * register used in `rold = r?` / `r? = rold` operations would be marked
+ * as precise, otherwise it's ID is not really interesting.
+ */
+ if (rold->precise && rold->id && !check_ids(rold->id, rcur->id, idmap))
Do we need rold->id checking in the above? check_ids should have
rold->id = 0 properly. Or this is just an optimization?
You are correct, the check_ids() handles this case and it should be inlined,
so there is no need to check rold->id in this 'if' branch.
regs_exact() has check_ids as well. Not sure whether it makes sense to
create a function regs_exact_scalar() just for scalar and include the
above code. Otherwise, it is strange we do check_ids in different
places.
I'm not sure how to best re-organize code here, regs_exact() is a nice
compartmentalized abstraction. It is possible to merge my additional
check_ids() call with the main 'precise' processing part as below:
@@ -15152,21 +15154,22 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
switch (base_type(rold->type)) {
case SCALAR_VALUE:
if (regs_exact(rold, rcur, idmap))
return true;
if (env->explore_alu_limits)
return false;
if (!rold->precise)
return true;
/* new val must satisfy old val knowledge */
return range_within(rold, rcur) &&
- tnum_in(rold->var_off, rcur->var_off);
+ tnum_in(rold->var_off, rcur->var_off) &&
+ check_ids(rold->id, rcur->id, idmap);
I'd say that extending /* new val must satisfy ... */ comment to
explain why check_ids() is needed should be sufficient, but I'm open
for suggestions.
On the other hand, I wanted to have a separate 'if' branch like:
if (rold->precise && !check_ids(rold->id, rcur->id, idmap))
Specifically to explain that 'rold->precise' part is an optimization.
Okay, I think you could keep your original implementation. I do think
checking rold->ref_obj_id in regs_exact is not needed for
SCALAR_VALUE but it may not be that important. The check_ids
checking in reg_exact (for SCALAR_VALUE) can also be skipped
if !rold->precise as an optimization. That is why I mention
to 'inline' regs_exact and re-arrange the codes. You can still
mention that optimization w.r.t. rold->precise. Overall if the code
is more complex, I am okay with your current change.
+ return false;
if (regs_exact(rold, rcur, idmap))
return true;
if (env->explore_alu_limits)