Re: [PATCH v2 bpf-next 9/9] bpf: precise scalar_value tracking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/14/19 2:45 PM, Andrii Nakryiko wrote:
> On Fri, Jun 14, 2019 at 12:26 AM Alexei Starovoitov <ast@xxxxxxxxxx> wrote:
>>
>> Introduce precision tracking logic that
>> helps cilium programs the most:
>>                    old clang  old clang    new clang  new clang
>>                            with all patches         with all patches
>> bpf_lb-DLB_L3.o      1838     2728         1923      2216
>> bpf_lb-DLB_L4.o      3218     3562         3077      3390
>> bpf_lb-DUNKNOWN.o    1064     544          1062      543
>> bpf_lxc-DDROP_ALL.o  26935    15989        166729    15372
>> bpf_lxc-DUNKNOWN.o   34439    26043        174607    22156
>> bpf_netdev.o         9721     8062         8407      7312
>> bpf_overlay.o        6184     6138         5420      5555
>> bpf_lxc_jit.o        39389    39452        39389     39452
>>
>> Consider code:
>> 654: (85) call bpf_get_hash_recalc#34
>> 655: (bf) r7 = r0
>> 656: (15) if r8 == 0x0 goto pc+29
>> 657: (bf) r2 = r10
>> 658: (07) r2 += -48
>> 659: (18) r1 = 0xffff8881e41e1b00
>> 661: (85) call bpf_map_lookup_elem#1
>> 662: (15) if r0 == 0x0 goto pc+23
>> 663: (69) r1 = *(u16 *)(r0 +0)
>> 664: (15) if r1 == 0x0 goto pc+21
>> 665: (bf) r8 = r7
>> 666: (57) r8 &= 65535
>> 667: (bf) r2 = r8
>> 668: (3f) r2 /= r1
>> 669: (2f) r2 *= r1
>> 670: (bf) r1 = r8
>> 671: (1f) r1 -= r2
>> 672: (57) r1 &= 255
>> 673: (25) if r1 > 0x1e goto pc+12
>>   R0=map_value(id=0,off=0,ks=20,vs=64,imm=0) R1_w=inv(id=0,umax_value=30,var_off=(0x0; 0x1f))
>> 674: (67) r1 <<= 1
>> 675: (0f) r0 += r1
>>
>> At this point the verifier will notice that scalar R1 is used in map pointer adjustment.
>> R1 has to be precise for later operations on R0 to be validated properly.
>>
>> The verifier will backtrack the above code in the following way:
>> last_idx 675 first_idx 664
>> regs=2 stack=0 before 675: (0f) r0 += r1         // started backtracking R1 regs=2 is a bitmask
>> regs=2 stack=0 before 674: (67) r1 <<= 1
>> regs=2 stack=0 before 673: (25) if r1 > 0x1e goto pc+12
>> regs=2 stack=0 before 672: (57) r1 &= 255
>> regs=2 stack=0 before 671: (1f) r1 -= r2         // now both R1 and R2 has to be precise -> regs=6 mask
>> regs=6 stack=0 before 670: (bf) r1 = r8          // after this insn R8 and R2 has to be precise
>> regs=104 stack=0 before 669: (2f) r2 *= r1       // after this one R8, R2, and R1
>> regs=106 stack=0 before 668: (3f) r2 /= r1
>> regs=106 stack=0 before 667: (bf) r2 = r8
>> regs=102 stack=0 before 666: (57) r8 &= 65535
>> regs=102 stack=0 before 665: (bf) r8 = r7
>> regs=82 stack=0 before 664: (15) if r1 == 0x0 goto pc+21
>>   // this is the end of verifier state. The following regs will be marked precised:
>>   R1_rw=invP(id=0,umax_value=65535,var_off=(0x0; 0xffff)) R7_rw=invP(id=0)
>> parent didn't have regs=82 stack=0 marks         // so backtracking continues into parent state
>> last_idx 663 first_idx 655
>> regs=82 stack=0 before 663: (69) r1 = *(u16 *)(r0 +0)   // R1 was assigned no need to track it further
>> regs=80 stack=0 before 662: (15) if r0 == 0x0 goto pc+23    // keep tracking R7
>> regs=80 stack=0 before 661: (85) call bpf_map_lookup_elem#1  // keep tracking R7
>> regs=80 stack=0 before 659: (18) r1 = 0xffff8881e41e1b00
>> regs=80 stack=0 before 658: (07) r2 += -48
>> regs=80 stack=0 before 657: (bf) r2 = r10
>> regs=80 stack=0 before 656: (15) if r8 == 0x0 goto pc+29
>> regs=80 stack=0 before 655: (bf) r7 = r0                // here the assignment into R7
>>   // mark R0 to be precise:
>>   R0_rw=invP(id=0)
>> parent didn't have regs=1 stack=0 marks                 // regs=1 -> tracking R0
>> last_idx 654 first_idx 644
>> regs=1 stack=0 before 654: (85) call bpf_get_hash_recalc#34 // and in the parent frame it was a return value
>>    // nothing further to backtrack
>>
>> Two scalar registers not marked precise are equivalent from state pruning point of view.
>> More details in the patch comments.
>>
>> It doesn't support bpf2bpf calls yet and enabled for root only.
>>
>> Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
>> ---
> 
> <snip>
> 
>> @@ -958,6 +983,17 @@ static void __reg_bound_offset(struct bpf_reg_state *reg)
>>          reg->var_off = tnum_intersect(reg->var_off,
>>                                        tnum_range(reg->umin_value,
>>                                                   reg->umax_value));
>> +       /* if register became known constant after a sequence of comparisons
>> +        * or arithmetic operations mark it precise now, since backtracking
>> +        * cannot follow such logic.
>> +        * Example:
>> +        * r0 = get_random();
>> +        * if (r0 < 1) goto ..
>> +        * if (r0 > 1) goto ..
>> +        * r0 is const here
>> +        */
>> +       if (tnum_is_const(reg->var_off))
>> +               reg->precise = true;
> 
> I'm not sure you have to do this: r0 value might never be used in a
> "precise" context. But worse, if it is required to be precise,
> backtracking logic will stop here, while it has to continue to the
> previous conditional jumps and keep marking r0 as precise.

Excellent catch.
That was a left over when backtracking was only tracking constants.
But it wasn't that effective to reduce insn_processed.
Removed it.

> 
>>   }
>>
>>   /* Reset the min/max bounds of a register */
>> @@ -967,6 +1003,9 @@ static void __mark_reg_unbounded(struct bpf_reg_state *reg)
>>          reg->smax_value = S64_MAX;
>>          reg->umin_value = 0;
>>          reg->umax_value = U64_MAX;
>> +
>> +       /* constant backtracking is enabled for root only for now */
>> +       reg->precise = capable(CAP_SYS_ADMIN) ? false : true;
>>   }
>>
>>   /* Mark a register as having a completely unknown (scalar) value. */
>> @@ -1457,6 +1496,9 @@ static int check_stack_write(struct bpf_verifier_env *env,
>>
>>          if (reg && size == BPF_REG_SIZE && register_is_const(reg) &&
>>              !register_is_null(reg) && env->allow_ptr_leaks) {
>> +               if (env->prog->insnsi[insn_idx].dst_reg != BPF_REG_FP)
>> +                       /* backtracking logic can only recognize explicit [fp-X] */
>> +                       reg->precise = true;
> 
> This has similar problem as above. Every time you proactively mark
> some register/stack slot as precise, you have to do backtrack logic to
> mark relevant register precise.

Another great point!
Indeed. Switched to mark_chain_precision() at this point and
added a comment.

> 
>>                  save_register_state(state, spi, reg);
>>          } else if (reg && is_spillable_regtype(reg->type)) {
>>                  /* register containing pointer is being spilled into stack */
>> @@ -1610,6 +1652,10 @@ static int check_stack_read(struct bpf_verifier_env *env,
>>                                   * so the whole register == const_zero
>>                                   */
>>                                  __mark_reg_const_zero(&state->regs[value_regno]);
>> +                               /* backtracking doesn't support STACK_ZERO yet,
>> +                                * so conservatively mark it precise
>> +                                */
>> +                               state->regs[value_regno].precise = true;
> 
> This is probably ok without backtracking, because of STACK_ZERO being
> implicitly precise. But flagging just in case.

After further analysis. It's ok to do precise=true here,
but corresponding spill also needs to have:
if (reg && register_is_null(reg))
       /* backtracking doesn't work for STACK_ZERO yet. */
       err = mark_chain_precision(reg);

Also added a comment.

> 
>>                          } else {
>>                                  /* have read misc data from the stack */
>>                                  mark_reg_unknown(env, state->regs, value_regno);
>> @@ -2735,6 +2781,369 @@ static int int_ptr_type_to_size(enum bpf_arg_type type)
>>          return -EINVAL;
>>   }
>>
>> +/* for any branch, call, exit record the history of jmps in the given state */
>> +static int push_jmp_history(struct bpf_verifier_env *env,
>> +                           struct bpf_verifier_state *cur)
>> +{
>> +       struct bpf_idx_pair *p;
>> +       u32 cnt = cur->jmp_history_cnt;
> 
> Reverse Christmas tree.

lol. 'patch delay' mechanism is now used against me :)
fixed.

>> +
>> +       cnt++;
>> +       p = krealloc(cur->jmp_history, cnt * sizeof(*p), GFP_USER);
>> +       if (!p)
>> +               return -ENOMEM;
>> +       p[cnt - 1].idx = env->insn_idx;
>> +       p[cnt - 1].prev_idx = env->prev_insn_idx;
>> +       cur->jmp_history = p;
>> +       cur->jmp_history_cnt = cnt;
>> +       return 0;
>> +}
>> +
>> +/* Backtrack one insn at a time. If idx is not at the top of recorded
>> + * history then previous instruction came from straight line execution.
>> + */
>> +static int pop_and_get_prev_idx(struct bpf_verifier_state *st, int i)
> 
> This operation destroys jmp_history, which is a problem if there is
> another branch yet-to-be-processed, which might need jmp history again
> to mark some other register as precise.

Absolutely. Fixed.

>> +{
>> +       u32 cnt = st->jmp_history_cnt;
>> +
>> +       if (cnt && st->jmp_history[cnt - 1].idx == i) {
>> +               i = st->jmp_history[cnt - 1].prev_idx;
>> +               st->jmp_history_cnt--;
>> +       } else {
>> +               i--;
>> +       }
>> +       return i;
>> +}
>> +
> 
> <snip>
> 
>> +       } else if (class == BPF_JMP || class == BPF_JMP32) {
>> +               if (opcode == BPF_CALL) {
>> +                       if (insn->src_reg == BPF_PSEUDO_CALL)
>> +                               return -ENOTSUPP;
>> +                       else
>> +                               /* regular helper call sets R0 */
>> +                               *reg_mask &= ~1;
> 
> Regular helper also clobbers R1-R5, which from the standpoint of
> verifier should be treated as R[1-5] = <UNKNOWN>, so:
> 
> *reg_mask &= ~0x3f

It wasn't clearing because backtracking starts from insn that
triggered backtracking.
And in case of call to helper it would clear the reg immediately :)
So I had -1 hack, but that wasn't correct either due to jmp_history.
So now I've added a logic to skip first insn that triggered backtracking
and added a warning
if (*reg_mask & 0x3f)
/* if backtracing was looking for registers R1-R5
  * they should have been found already.
  */
which is a good check for sanity of backtracking.
I was happy to see that it didn't fire in any of the tests :)

> 
>> +               } else if (opcode == BPF_EXIT) {
>> +                       return -ENOTSUPP;
>> +               }
>> +       } else if (class == BPF_LD) {
>> +               if (!(*reg_mask & dreg))
>> +                       return 0;
> 
> <snip>
> 
>> + *
>> + * Note the verifier cannot simply walk register parentage chain,
>> + * since many different registers and stack slots could have been
>> + * used to compute single precise scalar.
>> + *
>> + * It's not safe to start with precise=true and backtrack
>> + * when passing scalar register into a helper that takes ARG_ANYTHING.
> 
> It took me many reads to understand what this means (I think). Here
> you are saying that approach of starting with precise=true for
> register and then backtracking to mark it as not precise when we
> detect that we don't care about specific value (e.g., when helper
> takes register as ARG_ANYTHING parameter) is not safe. Is that correct
> interpretation? If yes, slightly less brief comment might be
> appropriate ;)

good point. reworded.

> 
>> + *
>> + * It's ok to walk single parentage chain of the verifier states.
>> + * It's possible that this backtracking will go all the way till 1st insn.
>> + * All other branches will be explored for needing precision later.
>> + *
>> + * The backtracking needs to deal with cases like:
>> + *   R8=map_value(id=0,off=0,ks=4,vs=1952,imm=0) R9_w=map_value(id=0,off=40,ks=4,vs=1952,imm=0)
>> + * r9 -= r8
>> + * r5 = r9
>> + * if r5 > 0x79f goto pc+7
>> + *    R5_w=inv(id=0,umax_value=1951,var_off=(0x0; 0x7ff))
>> + * r5 += 1
>> + * ...
>> + * call bpf_perf_event_output#25
>> + *   where .arg5_type = ARG_CONST_SIZE_OR_ZERO
>> + *
>> + * and this case:
>> + * r6 = 1
>> + * call foo // uses callee's r6 inside to compute r0
>> + * r0 += r6
>> + * if r0 == 0 goto
>> + *
>> + * to track above reg_mask/stack_mask needs to be independent for each frame.
>> + *
>> + * Alslo if parent's curframe > frame where backtracking started,
> 
> typo: Alslo -> Also

fixed

> <snip>
> 
>> +
>> +static int mark_chain_precision(struct bpf_verifier_env *env, int regno)
>> +{
>> +       struct bpf_verifier_state *st = env->cur_state, *parent = st->parent;
>> +       int last_idx = env->insn_idx;
>> +       int first_idx = st->first_insn_idx;
>> +       struct bpf_func_state *func;
>> +       struct bpf_reg_state *reg;
>> +       u32 reg_mask = 1u << regno;
>> +       u64 stack_mask = 0;
>> +       int i, err;
> 
> reverse Christmas tree :)

not this one.
it's better to group variables logically.
last+first, func+reg, reg+stack.

> 
>> +
>> +       func = st->frame[st->curframe];
>> +       reg = &func->regs[regno];
>> +       if (reg->type != SCALAR_VALUE) {
> 
> <snip>
> 
>> +                       }
>> +               }
>> +               st = parent;
> 
> not sure why you need parent variable, just st = st->parent

fixed

>> +               if (!st)
>> +                       break;
>> +
> 
> <snip>
> 
>>
>> @@ -4120,6 +4531,9 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
>>                  return 0;
>>          }
>>
>> +       if (src_reg.precise)
>> +               dst_reg->precise = true;
> 
> This doesn't seem necessary and correct. If dst_reg is never used in a
> precise context, then it doesn't have to be precise.

correct. this type of propagation is not safe. removed it.

Thanks a ton for detailed code review. v3 is coming.




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux