Re: [PATCH bpf-next v3 2/6] bpf/verifier: refactor checks for range computation

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Thu, 25 Apr 2024 16:05:05 -0700

On Wed, Apr 24, 2024 at 3:41 PM Cupertino Miranda
<cupertino.miranda@xxxxxxxxxx> wrote:
>
> Split range computation checks in its own function, isolating pessimitic
> range set for dst_reg and failing return to a single point.
>
> Signed-off-by: Cupertino Miranda <cupertino.miranda@xxxxxxxxxx>
> Cc: Yonghong Song <yonghong.song@xxxxxxxxx>
> Cc: Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx>
> Cc: David Faust <david.faust@xxxxxxxxxx>
> Cc: Jose Marchesi <jose.marchesi@xxxxxxxxxx>
> Cc: Elena Zannoni <elena.zannoni@xxxxxxxxxx>
> ---
>  kernel/bpf/verifier.c | 141 +++++++++++++++++++++++-------------------
>  1 file changed, 77 insertions(+), 64 deletions(-)
>

I know you are moving around pre-existing code, so a bunch of nits
below are to pre-existing code, but let's use this as an opportunity
to clean it up a bit.

> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 6fe641c8ae33..829a12d263a5 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -13695,6 +13695,82 @@ static void scalar_min_max_arsh(struct bpf_reg_state *dst_reg,
>         __update_reg_bounds(dst_reg);
>  }
>
> +static bool is_const_reg_and_valid(struct bpf_reg_state reg, bool alu32,

hm.. why passing reg_state by value? Use pointer?

> +                                  bool *valid)
> +{
> +       s64 smin_val = reg.smin_value;
> +       s64 smax_val = reg.smax_value;
> +       u64 umin_val = reg.umin_value;
> +       u64 umax_val = reg.umax_value;
> +

don't add empty line between variable declarations, all variables
should be in a single continuous block

> +       s32 s32_min_val = reg.s32_min_value;
> +       s32 s32_max_val = reg.s32_max_value;
> +       u32 u32_min_val = reg.u32_min_value;
> +       u32 u32_max_val = reg.u32_max_value;
> +

but see below, I'm not sure we even need these local variables, they
don't save all that much typing

> +       bool known = alu32 ? tnum_subreg_is_const(reg.var_off) :
> +                            tnum_is_const(reg.var_off);

"known" is a misnomer, imo. It's `is_const`.

> +
> +       if (alu32) {
> +               if ((known &&
> +                    (s32_min_val != s32_max_val || u32_min_val != u32_max_val)) ||
> +                     s32_min_val > s32_max_val || u32_min_val > u32_max_val)
> +                       *valid = false;
> +       } else {
> +               if ((known &&
> +                    (smin_val != smax_val || umin_val != umax_val)) ||
> +                   smin_val > smax_val || umin_val > umax_val)
> +                       *valid = false;
> +       }
> +
> +       return known;

The above is really hard to follow, especially how known && !known
cases are being handled is very easy to misinterpret. How about we
rewrite the equivalent logic in a few steps:

if (alu32) {
    if (tnum_subreg_is_const(reg.var_off)) {
        return reg->s32_min_value == reg->s32_max_value &&
               reg->u32_min_value == reg->u32_max_value;
    } else {
        return reg->s32_min_value <= reg->s32_max_value &&
               reg->u32_min_value <= reg->u32_max_value;
    }
} else {
   /* same as above for 64-bit bounds */
}

And you don't even need any local variables, while all the important
conditions are a bit more easy to follow? Or is it just me?

> +}
> +
> +static bool is_safe_to_compute_dst_reg_range(struct bpf_insn *insn,
> +                                            struct bpf_reg_state src_reg)
> +{
> +       bool src_known;
> +       u64 insn_bitness = (BPF_CLASS(insn->code) == BPF_ALU64) ? 64 : 32;

whole u64 for this seems like an overkill, I'd just stick to `int`

> +       bool alu32 = (BPF_CLASS(insn->code) != BPF_ALU64);

insn_bitness == 32 ?

> +       u8 opcode = BPF_OP(insn->code);
> +

nit: don't split variables block with empty line

> +       bool valid_known = true;

need an empty line between variable declarations and the rest

> +       src_known = is_const_reg_and_valid(src_reg, alu32, &valid_known);
> +
> +       /* Taint dst register if offset had invalid bounds
> +        * derived from e.g. dead branches.
> +        */
> +       if (valid_known == false)

nit: !valid_known

> +               return false;

given this logic/handling, why not just return false from
is_const_reg_and_valid() if it's a constant, but it's not
valid/consistent? It's simpler and fits the logic and function's name,
no? See my suggestion above

> +
> +       switch (opcode) {

inline opcode variable here, you use it just once

> +       case BPF_ADD:
> +       case BPF_SUB:
> +       case BPF_AND:
> +               return true;
> +
> +       /* Compute range for the following only if the src_reg is known.
> +        */
> +       case BPF_XOR:
> +       case BPF_OR:
> +       case BPF_MUL:
> +               return src_known;
> +
> +       /* Shift operators range is only computable if shift dimension operand
> +        * is known. Also, shifts greater than 31 or 63 are undefined. This
> +        * includes shifts by a negative number.
> +        */
> +       case BPF_LSH:
> +       case BPF_RSH:
> +       case BPF_ARSH:

preserve original comment?

> -                       /* Shifts greater than 31 or 63 are undefined.
> -                        * This includes shifts by a negative number.
> -                        */

> +               return (src_known && src_reg.umax_value < insn_bitness);

nit: unnecessary ()

> +       default:
> +               break;

return false here, and drop return below

> +       }
> +
> +       return false;
> +}
> +
>  /* WARNING: This function does calculations on 64-bit values, but the actual
>   * execution may occur on 32-bit values. Therefore, things like bitshifts
>   * need extra checks in the 32-bit case.

[...]