Re: [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments.

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 6 Oct 2020 20:31:23 -0700

On Tue, Oct 6, 2020 at 7:18 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Tue, Oct 06, 2020 at 06:56:14PM -0700, Andrii Nakryiko wrote:
> > On Tue, Oct 6, 2020 at 1:14 PM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > From: Alexei Starovoitov <ast@xxxxxxxxxx>
> > >
> > > The llvm register allocator may use two different registers representing the
> > > same virtual register. In such case the following pattern can be observed:
> > > 1047: (bf) r9 = r6
> > > 1048: (a5) if r6 < 0x1000 goto pc+1
> > > 1050: ...
> > > 1051: (a5) if r9 < 0x2 goto pc+66
> > > 1052: ...
> > > 1053: (bf) r2 = r9 /* r2 needs to have upper and lower bounds */
> > >
> > > In order to track this information without backtracking allocate ID
> > > for scalars in a similar way as it's done for find_good_pkt_pointers().
> > >
> > > When the verifier encounters r9 = r6 assignment it will assign the same ID
> > > to both registers. Later if either register range is narrowed via conditional
> > > jump propagate the register state into the other register.
> > >
> > > Clear register ID in adjust_reg_min_max_vals() for any alu instruction.
> > >
> > > Newly allocated register ID is ignored for scalars in regsafe() and doesn't
> > > affect state pruning. mark_reg_unknown() also clears the ID.
> > >
> > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
> > > ---
> >
> > I couldn't find the problem with the logic, though it's quite
> > non-obvious at times that reg->id will be cleared on BPF_END/BPF_NEG
> > and few other operations. But I think naming of this function can be
> > improved, see below.
> >
> > Also, profiler.c is great, but it would still be nice to add selftest
> > to test_verifier that will explicitly test the logic in this patch
>
> the test align.c actualy does the id checking better than I expected.
> I'm planning to add more asm tests in the follow up.
>

ok

Acked-by: Andrii Nakryiko <andrii@xxxxxxxxxx>

> > >  kernel/bpf/verifier.c                         | 38 +++++++++++++++++++
> > >  .../testing/selftests/bpf/prog_tests/align.c  | 16 ++++----
> > >  .../bpf/verifier/direct_packet_access.c       |  2 +-
> > >  3 files changed, 47 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 01120acab09a..09e17b483b0b 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -6432,6 +6432,8 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
> > >         src_reg = NULL;
> > >         if (dst_reg->type != SCALAR_VALUE)
> > >                 ptr_reg = dst_reg;
> > > +       else
> > > +               dst_reg->id = 0;
> > >         if (BPF_SRC(insn->code) == BPF_X) {
> > >                 src_reg = &regs[insn->src_reg];
> > >                 if (src_reg->type != SCALAR_VALUE) {
> > > @@ -6565,6 +6567,8 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
> > >                                 /* case: R1 = R2
> > >                                  * copy register state to dest reg
> > >                                  */
> > > +                               if (src_reg->type == SCALAR_VALUE)
> > > +                                       src_reg->id = ++env->id_gen;
> > >                                 *dst_reg = *src_reg;
> > >                                 dst_reg->live |= REG_LIVE_WRITTEN;
> > >                                 dst_reg->subreg_def = DEF_NOT_SUBREG;
> > > @@ -7365,6 +7369,30 @@ static bool try_match_pkt_pointers(const struct bpf_insn *insn,
> > >         return true;
> > >  }
> > >
> > > +static void find_equal_scalars(struct bpf_verifier_state *vstate,
> > > +                              struct bpf_reg_state *known_reg)
> >
> > this is double-misleading name:
> >
> > 1) it's not just "find", but also "update" (or rather the purpose of
> > this function is specifically to update registers, not find them, as
> > we don't really return found register)
> > 2) "equal" is not exactly true either. You can have two scalar
> > register with exactly the same state, but they might not share ->id.
> > So it's less about being equal, rather being "linked" by assignment.
>
> I don't think I can agree.
> We already have find_good_pkt_pointers() that also updates,
> so 'find' fits better than 'update'.

find_good_pkt_pointers() has similarly confusing name, but sure,
consistency rules

> 'linked' is also wrong. The regs are exactly equal.
> In case of pkt and other pointers two regs will have the same id
> as well, but they will not be equal. Here these two scalars are equal
> otherwise doing *reg = *known_reg would be wrong.

Ok, I guess it also means that "reg->type == SCALAR_VALUE" checks
below are unnecessary as well, because if known_reg->id matches, that
means register states are exactly the same.

>
> > > +{
> > > +       struct bpf_func_state *state;
> > > +       struct bpf_reg_state *reg;
> > > +       int i, j;
> > > +
> > > +       for (i = 0; i <= vstate->curframe; i++) {
> > > +               state = vstate->frame[i];
> > > +               for (j = 0; j < MAX_BPF_REG; j++) {
> > > +                       reg = &state->regs[j];
> > > +                       if (reg->type == SCALAR_VALUE && reg->id == known_reg->id)
> > > +                               *reg = *known_reg;
> > > +               }
> > > +
> > > +               bpf_for_each_spilled_reg(j, state, reg) {
> > > +                       if (!reg)
> > > +                               continue;
> > > +                       if (reg->type == SCALAR_VALUE && reg->id == known_reg->id)
> > > +                               *reg = *known_reg;
> > > +               }
> > > +       }
> > > +}
> > > +
> > >  static int check_cond_jmp_op(struct bpf_verifier_env *env,
> > >                              struct bpf_insn *insn, int *insn_idx)
> > >  {
> > > @@ -7493,6 +7521,11 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
> > >                                 reg_combine_min_max(&other_branch_regs[insn->src_reg],
> > >                                                     &other_branch_regs[insn->dst_reg],
> > >                                                     src_reg, dst_reg, opcode);
> > > +                       if (src_reg->id) {
> > > +                               find_equal_scalars(this_branch, src_reg);
> > > +                               find_equal_scalars(other_branch, &other_branch_regs[insn->src_reg]);
> > > +                       }
> > > +
> > >                 }
> > >         } else if (dst_reg->type == SCALAR_VALUE) {
> > >                 reg_set_min_max(&other_branch_regs[insn->dst_reg],
> > > @@ -7500,6 +7533,11 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
> > >                                         opcode, is_jmp32);
> > >         }
> > >
> > > +       if (dst_reg->type == SCALAR_VALUE && dst_reg->id) {
> > > +               find_equal_scalars(this_branch, dst_reg);
> > > +               find_equal_scalars(other_branch, &other_branch_regs[insn->dst_reg]);
> >
> > will this cover the case above where reg_combine_min_max() can update
> > dst_reg's as well?
>
> yes.
>
> > Even if yes, it probably would be more
> > straightforward to call appropriate updates in the respective if
> > branches (it's just a single line for each register, so not like it's
> > duplicating tons of code).
>
> You mean inside reg_set_min_max() and inside reg_combine_min_max() ?
> That won't work because find_equal_scalars() needs access to the whole
> bpf_verifier_state and not just bpf_reg_state.

No, I meant something like this, few lines above:

if (BPF_SRC(insn->code) == BPF_X) {

    if (dst_reg->type == SCALAR_VALUE && src_reg->type == SCALAR_VALUE) {
        if (...)
        else if (...)
        else

        /* both src/dst regs in both this/other branches could have
been updated */
        find_equal_scalars(this_branch, src_reg);
        find_equal_scalars(this_branch, dst_reg);
        find_equal_scalars(other_branch, &other_branch_regs[insn->src_reg])
        find_equal_scalars(other_branch, &other_branch_regs[insn->dst_reg])
    }
} else if (dst_reg->type == SCALAR_VALUE) {
    reg_set_min_max(...);

    /* only dst_reg in both branches could have been updated */
    find_equal_scalars(this_branch, dst_reg);
    find_equal_scalars(other_branch, &other_branch_regs[insn->dst_reg]);
}

This keeps find_equal_scalars() for relevant registers very close to
places where those registers are updated, instead of jumping back and
forth between the complicated if  after it, and double-checking under
which circumstances dst_reg can be updated, for example.

>
> > It will make reasoning about this logic
> > easier, IMO. Also, moving reg->id check into find_equal_scalars()
> > would make the above suggestion even cleaner.
>
> I don't think so. I think checking for type == SCALAR && dst_reg->id != 0
> should be done outside of that function. It makes the logic cleaner.
> For the same reason we check type outside of find_good_pkt_pointers().