Re: [PATCH bpf-next 2/4] bpf: track find_equal_scalars history on per-instruction level

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 28 Feb 2024 13:36:27 -0800

On Wed, Feb 28, 2024 at 1:16 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Wed, 2024-02-28 at 11:58 -0800, Andrii Nakryiko wrote:
> [...]
>
> > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > > index cbfb235984c8..26e32555711c 100644
> > > --- a/include/linux/bpf_verifier.h
> > > +++ b/include/linux/bpf_verifier.h
> > > @@ -361,6 +361,7 @@ struct bpf_jmp_history_entry {
> > >         u32 prev_idx : 22;
> > >         /* special flags, e.g., whether insn is doing register stack spill/load */
> > >         u32 flags : 10;
> > > +       u64 equal_scalars;
> >
> > nit: should we call this concept as a bit more generic "linked
> > registers" instead of "equal scalars"?
>
> It's a historical name for the feature and it is present in a few commit and tests.
> Agree that "linked_registers" is better in current context.
> A bit reluctant but can change it here.
>
> [...]
>
> > I'm wondering if this pop/push set of primitives is the best approach?
>
> I kinda like it :)
>
> > What if we had pack/unpack operations, where for various checking
> > logic we'd be working with "unpacked" representation, e.g., something
> > like this:
> >
> > struct linked_reg_set {
> >     int cnt;
> >     struct {
>
> Will need a name here, otherwise iteration would be somewhat inconvenient.
> Suppose 'struct reg_or_spill'.
>
> >         int frameno;
> >         union {
> >             int spi;
> >             int regno;
> >         };
> >         bool is_set;
> >         bool is_reg;
> >     } reg_set[6];
> > };
> >
> > bt_set_equal_scalars() could accept `struct linked_reg_set*` instead
> > of bitmask itself. Same for find_equal_scalars().
>
> For clients it would be
>
>         while (equal_scalars_pop(&equal_scalars, &fr, &spi, &is_reg)) {
>                 if ((is_reg && bt_is_frame_reg_set(bt, fr, spi)) ||
>                     (!is_reg && bt_is_frame_slot_set(bt, fr, spi)))
>                     ...
>         }
>
>     --- vs ---
>
>         for (i = 0; i < equal_scalars->cnt; ++i) {
>                 struct reg_or_spill *r = equal_scalars->reg_set[i];
>
>                 if ((r->is_reg && bt_is_frame_reg_set(bt, r->frameno, r->regno)) ||
>                     (!r->is_reg && bt_is_frame_slot_set(bt, r->frameno, r->spi)))
>                     ...
>         }
>
> I'd say, no significant difference.

Can I disagree? I find the second to be much better. There is no
in-place modification of a mask, no out parameters, we have a clean
record r with a few fields. We also know the count upfront, though we
maintain a simple rule (mask == 0 => cnt == 0), so not really a big
deal either way.

>
> > I think even implementation of packing/unpacking would be more
> > straightforward and we won't even need all those ES_xxx consts (or at
> > least fewer of them).
> >
> > WDYT?
>
> I wouldn't say it simplifies packing/unpacking much.
> Below is the code using new data structure and it's like
> 59 lines old version vs 56 lines new version.

I'd say it's not about a number of lines, it's about ease of
understanding, reasoning, and using these helpers.

I do prefer the code you wrote below, but I'm not going to die on this
hill if you insist. I'll go think about the rest of the logic.

>
> --- 8< ----------------------------------------------------------------
>
> struct reg_or_spill {
>         int frameno;
>         union {
>                 int spi;
>                 int regno;
>         };
>         bool is_reg;
> };
>
> struct linked_reg_set {
>         int cnt;
>         struct reg_or_spill reg_set[6];
> };
>
> /* Pack one history entry for equal scalars as 10 bits in the following format:
>  * - 3-bits frameno
>  * - 6-bits spi_or_reg
>  * - 1-bit  is_reg
>  */
> static u64 linked_reg_set_pack(struct linked_reg_set *s)
> {
>         u64 val = 0;
>         int i;
>
>         for (i = 0; i < s->cnt; ++i) {
>                 struct reg_or_spill *r = &s->reg_set[i];
>                 u64 tmp = 0;
>
>                 tmp |= r->frameno & ES_FRAMENO_MASK;
>                 tmp |= (r->spi & ES_SPI_MASK) << ES_SPI_OFF;

nit: we shouldn't mask anything here, it just makes an impression that
r->frameno can be bigger than we have bits for it in a bitmask

>                 tmp |= (r->is_reg ? 1 : 0) << ES_IS_REG_OFF;
>
>                 val <<= ES_ENTRY_BITS;
>                 val |= tmp;

val <<= ES_ENTRY_BITS;
val |= r->frameno | (r->spi << ES_SPI_OFF) | ((r->is_reg ? 1 : 0) <<
ES_IS_REG_OFF);

or you can do it as three assignment, but there is no need for tmp

>         }
>         val <<= ES_SIZE_BITS;
>         val |= s->cnt;
>         return val;
> }
>
> static void linked_reg_set_unpack(u64 val, struct linked_reg_set *s)
> {
>         int i;
>
>         s->cnt = val & ES_SIZE_MASK;
>         val >>= ES_SIZE_BITS;
>
>         for (i = 0; i < s->cnt; ++i) {
>                 struct reg_or_spill *r = &s->reg_set[i];
>
>                 r->frameno =  val & ES_FRAMENO_MASK;
>                 r->spi     = (val >> ES_SPI_OFF) & ES_SPI_MASK;
>                 r->is_reg  = (val >> ES_IS_REG_OFF) & 0x1;
>                 val >>= ES_ENTRY_BITS;
>         }
> }
>

I do think that the above is much easier to read and follow.

> ---------------------------------------------------------------- >8 ---