Re: [PATCH bpf-next 01/13] bpf: generalize reg_set_min_max() to handle non-const register comparisons

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 3 Nov 2023 13:48:32 -0700

On Fri, Nov 3, 2023 at 1:39 PM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Fri, Nov 3, 2023 at 12:52 AM Shung-Hsi Yu <shung-hsi.yu@xxxxxxxx> wrote:
> >
> > On Thu, Nov 02, 2023 at 05:08:10PM -0700, Andrii Nakryiko wrote:
> > > Generalize bounds adjustment logic of reg_set_min_max() to handle not
> > > just register vs constant case, but in general any register vs any
> > > register cases. For most of the operations it's trivial extension based
> > > on range vs range comparison logic, we just need to properly pick
> > > min/max of a range to compare against min/max of the other range.
> > >
> > > For BPF_JSET we keep the original capabilities, just make sure JSET is
> > > integrated in the common framework. This is manifested in the
> > > internal-only BPF_KSET + BPF_X "opcode" to allow for simpler and more
> >                     ^ typo?
> >
> > Two more comments below
> >
> > > uniform rev_opcode() handling. See the code for details. This allows to
> > > reuse the same code exactly both for TRUE and FALSE branches without
> > > explicitly handling both conditions with custom code.
> > >
> > > Note also that now we don't need a special handling of BPF_JEQ/BPF_JNE
> > > case none of the registers are constants. This is now just a normal
> > > generic case handled by reg_set_min_max().
> > >
> > > To make tnum handling cleaner, tnum_with_subreg() helper is added, as
> > > that's a common operator when dealing with 32-bit subregister bounds.
> > > This keeps the overall logic much less noisy when it comes to tnums.
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> > > ---
> > >  include/linux/tnum.h  |   4 +
> > >  kernel/bpf/tnum.c     |   7 +-
> > >  kernel/bpf/verifier.c | 327 ++++++++++++++++++++----------------------
> > >  3 files changed, 165 insertions(+), 173 deletions(-)
> > >
> > > diff --git a/include/linux/tnum.h b/include/linux/tnum.h
> > > index 1c3948a1d6ad..3c13240077b8 100644
> > > --- a/include/linux/tnum.h
> > > +++ b/include/linux/tnum.h
> > > @@ -106,6 +106,10 @@ int tnum_sbin(char *str, size_t size, struct tnum a);
> > >  struct tnum tnum_subreg(struct tnum a);
> > >  /* Returns the tnum with the lower 32-bit subreg cleared */
> > >  struct tnum tnum_clear_subreg(struct tnum a);
> > > +/* Returns the tnum with the lower 32-bit subreg in *reg* set to the lower
> > > + * 32-bit subreg in *subreg*
> > > + */
> > > +struct tnum tnum_with_subreg(struct tnum reg, struct tnum subreg);
> > >  /* Returns the tnum with the lower 32-bit subreg set to value */
> > >  struct tnum tnum_const_subreg(struct tnum a, u32 value);
> > >  /* Returns true if 32-bit subreg @a is a known constant*/
> > > diff --git a/kernel/bpf/tnum.c b/kernel/bpf/tnum.c
> > > index 3d7127f439a1..f4c91c9b27d7 100644
> > > --- a/kernel/bpf/tnum.c
> > > +++ b/kernel/bpf/tnum.c
> > > @@ -208,7 +208,12 @@ struct tnum tnum_clear_subreg(struct tnum a)
> > >       return tnum_lshift(tnum_rshift(a, 32), 32);
> > >  }
> > >
> > > +struct tnum tnum_with_subreg(struct tnum reg, struct tnum subreg)
> > > +{
> > > +     return tnum_or(tnum_clear_subreg(reg), tnum_subreg(subreg));
> > > +}
> > > +
> > >  struct tnum tnum_const_subreg(struct tnum a, u32 value)
> > >  {
> > > -     return tnum_or(tnum_clear_subreg(a), tnum_const(value));
> > > +     return tnum_with_subreg(a, tnum_const(value));
> > >  }
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 2197385d91dc..52934080042c 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -14379,218 +14379,211 @@ static int is_branch_taken(struct bpf_reg_state *reg1, struct bpf_reg_state *reg
> > >       return is_scalar_branch_taken(reg1, reg2, opcode, is_jmp32);
> > >  }
> > >
> > > -/* Adjusts the register min/max values in the case that the dst_reg and
> > > - * src_reg are both SCALAR_VALUE registers (or we are simply doing a BPF_K
> > > - * check, in which case we havea fake SCALAR_VALUE representing insn->imm).
> > > - * Technically we can do similar adjustments for pointers to the same object,
> > > - * but we don't support that right now.
> > > +/* Opcode that corresponds to a *false* branch condition.
> > > + * E.g., if r1 < r2, then reverse (false) condition is r1 >= r2
> > >   */
> > > -static void reg_set_min_max(struct bpf_reg_state *true_reg1,
> > > -                         struct bpf_reg_state *true_reg2,
> > > -                         struct bpf_reg_state *false_reg1,
> > > -                         struct bpf_reg_state *false_reg2,
> > > -                         u8 opcode, bool is_jmp32)
> > > +static u8 rev_opcode(u8 opcode)
> >
> > Nit: rev_opcode and flip_opcode seems like a possible source of confusing
> > down the line. Flip and reverse are often interchangable, i.e. "flip the
> > order" and "reverse the order" is the same thing.
> >
> > Maybe "neg_opcode" or "neg_cond_opcode"?
>
> neg has too strong connotation with BPF_NEG, so not really happy with
> this one. In selftest I used "complement_op", but it's also quite
> arbitrary.
>
> >
> > Or do it the otherway around, keep rev_opcode but rename flip_opcode.
>
> how about flip_opcode -> swap_opcode? and then keep reg_opcode as is?

nah, swap_opcode sounds wrong as well. I guess I'll just leave it as is for now.

>
> >
> > One more comment about BPF_JSET below
> >
>
> please trim big chunks of code you are not commenting on to keep
> emails a bit shorter
>
> [...]
>
>
> > >               if (is_jmp32) {
> > > -                     __mark_reg32_known(false_reg1, uval32);
> > > -                     false_32off = tnum_subreg(false_reg1->var_off);
> > > +                     if (opcode & BPF_X)
> > > +                             t = tnum_and(tnum_subreg(reg1->var_off), tnum_const(~val));
> > > +                     else
> > > +                             t = tnum_or(tnum_subreg(reg1->var_off), tnum_const(val));
> > > +                     reg1->var_off = tnum_with_subreg(reg1->var_off, t);
> > >               } else {
> > > -                     ___mark_reg_known(false_reg1, uval);
> > > -                     false_64off = false_reg1->var_off;
> > > +                     if (opcode & BPF_X)
> > > +                             reg1->var_off = tnum_and(reg1->var_off, tnum_const(~val));
> > > +                     else
> > > +                             reg1->var_off = tnum_or(reg1->var_off, tnum_const(val));
> > >               }
> > >               break;
> >
> > Since you're already adding a tnum helper, I think we can add one more
> > for BPF_JSET here
> >
> >         struct tnum tnum_neg(struct tnum a)
> >         {
> >                 return TNUM(~a.value, a.mask);
> >         }
> >
>
> I'm not sure what tnum_neg() does (even if the correct
> implementation), but either way I'd like to minimize touching tnum
> stuff, it's too tricky :) we can address that as a separate patch if
> you'd like
>
>
> > So instead of getting a value out of tnum then putting the value back
> > into tnum again
> >
> >     u64 val;
> >     val = reg_const_value(reg2, is_jmp32);
> >     tnum_ops(..., tnum_const(val or ~val);
> >
> > Keep the value in tnum and process it as-is if possible
> >
> >     tnum_ops(..., reg2->var_off or tnum_neg(reg2->var_off));
>
> >
> > And with that hopefully make this fragment short enough that we don't
> > mind duplicate a bit of code to seperate the BPF_JSET case from the
> > BPF_JSET | BPF_X case. IMO a conditional is_power_of_2 check followed by
> > two level of branching is a bit too much to follow, it is better to have
> > them seperated just like how you're doing it for the others already.
>
> I can split those two cases without any new tnum helpers, the
> duplicated part is just const checking, basically, no big deal
>
> >
> > I.e. something like the follow
> >
> >         case BPF_JSET: {
> >                 if (!is_reg_const(reg2, is_jmp32))
> >                         swap(reg1, reg2);
> >                 if (!is_reg_const(reg2, is_jmp32))
> >                         break;
> >                 /* comment */
> >                 if (!is_power_of_2(reg_const_value(reg2, is_jmp32))
> >                         break;
> >
> >                 if (is_jmp32) {
> >                         t = tnum_or(tnum_subreg(reg1->var_off), tnum_subreg(reg2->var_off));
> >                         reg1->var_off = tnum_with_subreg(reg1->var_off, t);
> >                 } else {
> >                         reg1->var_off = tnum_or(reg1->var_off, reg2->var_off);
> >                 }
> >                 break;
> >         }
> >         case BPF_JSET | BPF_X: {
> >                 if (!is_reg_const(reg2, is_jmp32))
> >                         swap(reg1, reg2);
> >                 if (!is_reg_const(reg2, is_jmp32))
> >                         break;
> >
> >                 if (is_jmp32) {
> >                         /* a slightly long line ... */
> >                         t = tnum_and(tnum_subreg(reg1->var_off), tnum_neg(tnum_subreg(reg2->var_off)));
> >                         reg1->var_off = tnum_with_subreg(reg1->var_off, t);
> >                 } else {
> >                         reg1->var_off = tnum_and(reg1->var_off, tnum_neg(reg2->var_off));
> >                 }
> >                 break;
> >         }
> >
> > > ...