Re: [PATCH bpf-next v2 1/4] bpf: verify scalar ids mapping in regsafe() using check_ids()

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 30 May 2023 14:37:04 -0700

On Tue, May 30, 2023 at 10:27 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> Make sure that the following unsafe example is rejected by verifier:
>
> 1: r9 = ... some pointer with range X ...
> 2: r6 = ... unbound scalar ID=a ...
> 3: r7 = ... unbound scalar ID=b ...
> 4: if (r6 > r7) goto +1
> 5: r6 = r7
> 6: if (r6 > X) goto ...
> --- checkpoint ---
> 7: r9 += r7
> 8: *(u64 *)r9 = Y
>
> This example is unsafe because not all execution paths verify r7 range.
> Because of the jump at (4) the verifier would arrive at (6) in two states:
> I.  r6{.id=b}, r7{.id=b} via path 1-6;
> II. r6{.id=a}, r7{.id=b} via path 1-4, 6.
>
> Currently regsafe() does not call check_ids() for scalar registers,
> thus from POV of regsafe() states (I) and (II) are identical. If the
> path 1-6 is taken by verifier first, and checkpoint is created at (6)
> the path [1-4, 6] would be considered safe.
>
> This commit updates regsafe() to call check_ids() for scalar registers.
>
> This change is costly in terms of verification performance.
> Using veristat to compare number of processed states for selftests
> object files listed in tools/testing/selftests/bpf/veristat.cfg and
> Cilium object files from [1] gives the following statistics:
>
>   Filter        | Number of programs
>   ----------------------------------
>   states_pct>10 | 40
>   states_pct>20 | 20
>   states_pct>30 | 15
>   states_pct>40 | 11
>
> (Out of total 177 programs)
>
> In fact, impact is so bad that in no-alu32 mode the following
> test_progs tests no longer pass verifiction:
> - verif_scale2: maximal number of instructions exceeded
> - verif_scale3: maximal number of instructions exceeded
> - verif_scale_pyperf600: maximal number of instructions exceeded
>
> Additionally:
> - verifier_search_pruning/allocated_stack: expected prunning does not
>   happen because of differences in register id allocation.
>
> Followup patch would address these issues.
>
> [1] git@xxxxxxxxxx:anakryiko/cilium.git
>
> Fixes: 75748837b7e5 ("bpf: Propagate scalar ranges through register assignments.")
> Signed-off-by: Eduard Zingerman <eddyz87@xxxxxxxxx>
> ---
>  kernel/bpf/verifier.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index af70dad655ab..9c10f2619c4f 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -15151,6 +15151,28 @@ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
>
>         switch (base_type(rold->type)) {
>         case SCALAR_VALUE:
> +               /* Why check_ids() for scalar registers?
> +                *
> +                * Consider the following BPF code:
> +                *   1: r6 = ... unbound scalar, ID=a ...
> +                *   2: r7 = ... unbound scalar, ID=b ...
> +                *   3: if (r6 > r7) goto +1
> +                *   4: r6 = r7
> +                *   5: if (r6 > X) goto ...
> +                *   6: ... memory operation using r7 ...
> +                *
> +                * First verification path is [1-6]:
> +                * - at (4) same bpf_reg_state::id (b) would be assigned to r6 and r7;
> +                * - at (5) r6 would be marked <= X, find_equal_scalars() would also mark
> +                *   r7 <= X, because r6 and r7 share same id.
> +                *
> +                * Next verification path would start from (5), because of the jump at (3).
> +                * The only state difference between first and second visits of (5) is
> +                * bpf_reg_state::id assignments for r6 and r7: (b, b) vs (a, b).
> +                * Thus, use check_ids() to distinguish these states.
> +                */
> +               if (!check_ids(rold->id, rcur->id, idmap))
> +                       return false;

does this check_ids() have to be performed before regs_exact (which
also checks IDs, btw) *and* before rold->precise check?

Intuitively, it feels like `rold->precise = false` case shouldn't care
about IDs in rcur, as any value should be safe. But I haven't spent
much time thinking about this, so there might be some corner case I'm
missing.

I wonder if moving this check after we handled imprecise rold case would help.

Also, it might make sense to drop SCALAR register IDs as soon as we
have only one instance of it left (e.g., if "paired" register was
overwritten already). I.e., aggressively drop IDs when they become
useless. WDYT?

The u32_hashset.... Have you also measured verification time
regression with this? I wonder how much impact that change has?

>                 if (regs_exact(rold, rcur, idmap))
>                         return true;
>                 if (env->explore_alu_limits)
> --
> 2.40.1
>