Re: [PATCH v5 bpf-next 2/3] bpf: Relax precision marking in open coded iters and may_goto loop.

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Thu, 6 Jun 2024 13:02:45 -0700

On Thu, Jun 6, 2024 at 4:08 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Wed, 2024-06-05 at 17:54 -0700, Alexei Starovoitov wrote:
>
> [...]
>
> This looks interesting, need a bit more time to think about it.
> A few minor notes below.
>
> [...]
>
> > @@ -14704,6 +14705,165 @@ static u8 rev_opcode(u8 opcode)
> >       }
> >  }
> >
> > +/* Similar to mark_reg_unknown() and should only be called from cap_bpf path */
> > +static void mark_unknown(struct bpf_reg_state *reg)
> > +{
> > +     u32 id = reg->id;
> > +
> > +     __mark_reg_unknown_imprecise(reg);
> > +     reg->id = id;
> > +}
> > +/*
> > + * Similar to regs_refine_cond_op(), but instead of tightening the range
> > + * widen the upper bound of reg1 based on reg2 and
> > + * lower bound of reg2 based on reg1.
> > + */
> > +static void widen_reg_bounds(struct bpf_reg_state *reg1,
> > +                          struct bpf_reg_state *reg2,
> > +                          u8 opcode, bool is_jmp32)
> > +{
> > +     switch (opcode) {
> > +     case BPF_JGE:
> > +     case BPF_JGT:
> > +     case BPF_JSGE:
> > +     case BPF_JSGT:
> > +             opcode = flip_opcode(opcode);
> > +             swap(reg1, reg2);
> > +             break;
> > +     default:
> > +             break;
> > +     }
> > +
> > +     switch (opcode) {
> > +     case BPF_JLE:
> > +             if (is_jmp32) {
> > +                     reg1->u32_max_value = reg2->u32_max_value;
> > +                     reg1->s32_max_value = S32_MAX;
> > +                     reg1->umax_value = U64_MAX;
> > +                     reg1->smax_value = S64_MAX;
> > +
> > +                     reg2->u32_min_value = reg1->u32_min_value;
> > +                     reg2->s32_min_value = S32_MIN;
> > +                     reg2->umin_value = 0;
> > +                     reg2->smin_value = S64_MIN;
> > +             } else {
> > +                     reg1->umax_value = reg2->umax_value;
> > +                     reg1->smax_value = S64_MAX;
> > +                     reg1->u32_max_value = U32_MAX;
> > +                     reg1->s32_max_value = S32_MAX;
> > +
> > +                     reg2->umin_value = reg1->umin_value;
> > +                     reg2->smin_value = S64_MIN;
> > +                     reg2->u32_min_value = U32_MIN;
> > +                     reg2->s32_min_value = S32_MIN;
> > +             }
> > +             reg1->var_off = tnum_unknown;
> > +             reg2->var_off = tnum_unknown;
> > +             break;
>
> Just a random thought: suppose that one of the registers in question
> is used as an index int the array of ints, and compiler increments it
> using += 4. Would it be interesting to preserve alignment info in the
> var_off in such case? (in other words, preserve known trailing zeros).

Well, the verifier cannot figure out which register is
an induction variable. Compiler can generate a code where
would be multiple such registers too.
But even if it was one rX += 4
it's nor clear how to figure out the size of the increment.

Also the above code is called at the time of comparison like "if 2 < 100".
I figured I will try a heuristic at that time.
See attached diff.
It computes alignment of LHS and RHS and
then heuristically adjusts the range.
After spending all morning on it and various heuristics
I'm convinced that this is a dead end.
It cannot be made to work with i += 2 loops.

> [...]
>
> > @@ -15177,8 +15339,78 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
> >       }
> >
> >       is_jmp32 = BPF_CLASS(insn->code) == BPF_JMP32;
> > +     if (dst_reg->type != SCALAR_VALUE || src_reg->type != SCALAR_VALUE ||
> > +         /* Widen scalars only if they're constants */
> > +         !is_reg_const(dst_reg, is_jmp32) || !is_reg_const(src_reg, is_jmp32))
> > +             do_widen = false;
> > +     else if (reg_const_value(dst_reg, is_jmp32) == reg_const_value(src_reg, is_jmp32))
> > +             /* And not equal */
> > +             do_widen = false;
> > +     else
> > +             do_widen = (get_loop_entry(this_branch) ||
> > +                         this_branch->may_goto_depth) &&
> > +                             /* Gate widen_reg() logic */
> > +                             env->bpf_capable;
> > +
> >       pred = is_branch_taken(dst_reg, src_reg, opcode, is_jmp32);
> > -     if (pred >= 0) {
> > +
> > +     if (do_widen && ((opcode == BPF_JNE && pred == 1) ||
> > +                      (opcode == BPF_JEQ && pred == 0))) {
> > +             /*
> > +              * != is too vague. let's try < and > and widen. Example:
> > +              *
> > +              * R6=2
> > +              * 21: (15) if r6 == 0x3e8 goto pc+14
> > +              * Predicted == not-taken, but < is also true
> > +              * 21: R6=scalar(smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=999,var_off=(0x0; 0x3ff))
> > +              */
> > +             int refine_pred;
> > +             u8 opcode2 = BPF_JLT;
> > +
> > +             refine_pred = is_branch_taken(dst_reg, src_reg, BPF_JLT, is_jmp32);
> > +             if (refine_pred == 1) {
> > +                     widen_reg(env, dst_reg, src_reg, BPF_JLT, is_jmp32, true);
> > +
> > +             } else {
>
> nit: would it be possible to avoid second call to is_branch_taken()
>      by checking that refine_pred == 0 and assuming opcode2 == BPF_JGE?

I considered it, but it's too fragile, since it will depend
on both sides being constant (thought the algorithm currently requires it)
and on both sides not being equal (though the algorithm also
checks that beforehand).

Much easier to reason about and experiment with the algorithm
when < and > are checked explicitly.
So I prefer to keep it this way.

> > +                     opcode2 = BPF_JGT;
> > +                     refine_pred = is_branch_taken(dst_reg, src_reg, BPF_JGT, is_jmp32);
> > +                     if (refine_pred == 1)
> > +                             widen_reg(env, dst_reg, src_reg, BPF_JGT, is_jmp32, true);
> > +             }
> > +
> > +             if (refine_pred == 1) {
> > +                     if (dst_reg->id)
> > +                             find_equal_scalars(this_branch, dst_reg);
>
> I think it is necessary to do find_equal_scalars() for src_reg as well,
> since widen_reg() could change both registers.

Kinda.
Doing find_equal_scalars(dst_reg) makes zero difference in tests.
I've added it for completeness,
but decided to avoid spending extra cycles on src_reg,
since most of the time it's BPF_K anyway.
We can add it later when there is clear evidence that it helps.
Attachment:
mask.diff

Description: Binary data