Re: [PATCH bpf-next v3 2/6] bpf/verifier: refactor checks for range computation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andrii Nakryiko writes:

> On Wed, Apr 24, 2024 at 3:41 PM Cupertino Miranda
> <cupertino.miranda@xxxxxxxxxx> wrote:
>>
>> Split range computation checks in its own function, isolating pessimitic
>> range set for dst_reg and failing return to a single point.
>>
>> Signed-off-by: Cupertino Miranda <cupertino.miranda@xxxxxxxxxx>
>> Cc: Yonghong Song <yonghong.song@xxxxxxxxx>
>> Cc: Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx>
>> Cc: David Faust <david.faust@xxxxxxxxxx>
>> Cc: Jose Marchesi <jose.marchesi@xxxxxxxxxx>
>> Cc: Elena Zannoni <elena.zannoni@xxxxxxxxxx>
>> ---
>>  kernel/bpf/verifier.c | 141 +++++++++++++++++++++++-------------------
>>  1 file changed, 77 insertions(+), 64 deletions(-)
>>
>
> I know you are moving around pre-existing code, so a bunch of nits
> below are to pre-existing code, but let's use this as an opportunity
> to clean it up a bit.
>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 6fe641c8ae33..829a12d263a5 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -13695,6 +13695,82 @@ static void scalar_min_max_arsh(struct bpf_reg_state *dst_reg,
>>         __update_reg_bounds(dst_reg);
>>  }
>>
>> +static bool is_const_reg_and_valid(struct bpf_reg_state reg, bool alu32,
>
> hm.. why passing reg_state by value? Use pointer?
>
Someone mentioned this in a review already and I forgot to change it.
Apologies if I did not reply on this.

The reason why I pass by value, is more of an approach to programming.
I do it as guarantee to the caller that there is no mutation of
the value.
If it is better or worst from a performance point of view it is
arguable, since although it might appear to copy the value it also provides
more information to the compiler of the intent of the callee function,
allowing it to optimize further.
I personally would leave the copy by value, but I understand if you want
to keep having the same code style.


>> +                                  bool *valid)
>> +{
>> +       s64 smin_val = reg.smin_value;
>> +       s64 smax_val = reg.smax_value;
>> +       u64 umin_val = reg.umin_value;
>> +       u64 umax_val = reg.umax_value;
>> +
>
> don't add empty line between variable declarations, all variables
> should be in a single continuous block
>
>> +       s32 s32_min_val = reg.s32_min_value;
>> +       s32 s32_max_val = reg.s32_max_value;
>> +       u32 u32_min_val = reg.u32_min_value;
>> +       u32 u32_max_val = reg.u32_max_value;
>> +
>
> but see below, I'm not sure we even need these local variables, they
> don't save all that much typing
>
>> +       bool known = alu32 ? tnum_subreg_is_const(reg.var_off) :
>> +                            tnum_is_const(reg.var_off);
>
> "known" is a misnomer, imo. It's `is_const`.
>
>> +
>> +       if (alu32) {
>> +               if ((known &&
>> +                    (s32_min_val != s32_max_val || u32_min_val != u32_max_val)) ||
>> +                     s32_min_val > s32_max_val || u32_min_val > u32_max_val)
>> +                       *valid = false;
>> +       } else {
>> +               if ((known &&
>> +                    (smin_val != smax_val || umin_val != umax_val)) ||
>> +                   smin_val > smax_val || umin_val > umax_val)
>> +                       *valid = false;
>> +       }
>> +
>> +       return known;
>
>
> The above is really hard to follow, especially how known && !known
> cases are being handled is very easy to misinterpret. How about we
> rewrite the equivalent logic in a few steps:
>
> if (alu32) {
>     if (tnum_subreg_is_const(reg.var_off)) {
>         return reg->s32_min_value == reg->s32_max_value &&
>                reg->u32_min_value == reg->u32_max_value;
>     } else {
>         return reg->s32_min_value <= reg->s32_max_value &&
>                reg->u32_min_value <= reg->u32_max_value;
>     }
> } else {
>    /* same as above for 64-bit bounds */
> }
>
> And you don't even need any local variables, while all the important
> conditions are a bit more easy to follow? Or is it just me?
>

With current state of the code, indeed, it seems you don't need the extra
valid argument to pass the extra information.
Considering that the KNOWN value is now only used in the shift
operators, it seems now safe to merge both valid and the return value
from the function, since the logic will result in the same behaviour.

>> +}
>> +
>> +static bool is_safe_to_compute_dst_reg_range(struct bpf_insn *insn,
>> +                                            struct bpf_reg_state src_reg)
>> +{
>> +       bool src_known;
>> +       u64 insn_bitness = (BPF_CLASS(insn->code) == BPF_ALU64) ? 64 : 32;
>
> whole u64 for this seems like an overkill, I'd just stick to `int`
>
>> +       bool alu32 = (BPF_CLASS(insn->code) != BPF_ALU64);
>
> insn_bitness == 32 ?
>
>> +       u8 opcode = BPF_OP(insn->code);
>> +
>
> nit: don't split variables block with empty line
>
>> +       bool valid_known = true;
>
> need an empty line between variable declarations and the rest
>
>> +       src_known = is_const_reg_and_valid(src_reg, alu32, &valid_known);
>> +
>> +       /* Taint dst register if offset had invalid bounds
>> +        * derived from e.g. dead branches.
>> +        */
>> +       if (valid_known == false)
>
> nit: !valid_known
>
>> +               return false;
>
> given this logic/handling, why not just return false from
> is_const_reg_and_valid() if it's a constant, but it's not
> valid/consistent? It's simpler and fits the logic and function's name,
> no? See my suggestion above
>
>> +
>> +       switch (opcode) {
>
> inline opcode variable here, you use it just once
>
>> +       case BPF_ADD:
>> +       case BPF_SUB:
>> +       case BPF_AND:
>> +               return true;
>> +
>> +       /* Compute range for the following only if the src_reg is known.
>> +        */
>> +       case BPF_XOR:
>> +       case BPF_OR:
>> +       case BPF_MUL:
>> +               return src_known;
>> +
>> +       /* Shift operators range is only computable if shift dimension operand
>> +        * is known. Also, shifts greater than 31 or 63 are undefined. This
>> +        * includes shifts by a negative number.
>> +        */
>> +       case BPF_LSH:
>> +       case BPF_RSH:
>> +       case BPF_ARSH:
>
> preserve original comment?
>
>> -                       /* Shifts greater than 31 or 63 are undefined.
>> -                        * This includes shifts by a negative number.
>> -                        */
>
>> +               return (src_known && src_reg.umax_value < insn_bitness);
>
> nit: unnecessary ()
>
>> +       default:
>> +               break;
>
> return false here, and drop return below
>
>> +       }
>> +
>> +       return false;
>> +}
>> +
>>  /* WARNING: This function does calculations on 64-bit values, but the actual
>>   * execution may occur on 32-bit values. Therefore, things like bitshifts
>>   * need extra checks in the 32-bit case.
>
> [...]

Apart from the obvious coding style problems I will address those optimizations
in an independent patch in the end, if you agree with. I would prefer to
separate the improvements to avoid to change semantics in the
refactoring patch, as previously requested by Yonghong.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux