Re: [PATCH bpf-next v2 1/3] bpf: track find_equal_scalars history on per-instruction level

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 9 Jul 2024 22:28:23 -0700

On Tue, Jul 9, 2024 at 6:21 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Tue, 2024-07-09 at 17:34 -0700, Andrii Nakryiko wrote:
> > On Fri, Jul 5, 2024 at 1:59 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
> > >
> > > Use bpf_verifier_state->jmp_history to track which registers were
> > > updated by find_equal_scalars() when conditional jump was verified.
> > > Use recorded information in backtrack_insn() to propagate precision.
> > >
> > > E.g. for the following program:
> > >
> > >             while verifying instructions
> > >   r1 = r0              |
> > >   if r1 < 8  goto ...  | push r0,r1 as equal_scalars in jmp_history
> > >   if r0 > 16 goto ...  | push r0,r1 as equal_scalars in jmp_history
> >
> > linked_scalars? especially now that Alexei added offsets between
> > linked registers
>
> Missed this, will update.
>
> >
> > >   r2 = r10             |
> > >   r2 += r0             v mark_chain_precision(r0)
> > >
> > >             while doing mark_chain_precision(r0)
> > >   r1 = r0              ^
> > >   if r1 < 8  goto ...  | mark r0,r1 as precise
> > >   if r0 > 16 goto ...  | mark r0,r1 as precise
> > >   r2 = r10             |
> > >   r2 += r0             | mark r0 precise
> >
> > let's reverse the order here so it's linear in how the algorithm
> > actually works (backwards)?
>
> I thought the arrow would be enough. Ok, can reverse.

it's the reverse order compared to what you'd see in the verifier log.
I did see the arrow (though it wasn't all that clear on the first
reading), but still feels like it would be better to have consistent
order with verifier log

[...]

> > > @@ -3844,6 +3974,7 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
> > >                          */
> > >                         bt_set_reg(bt, dreg);
> > >                         bt_set_reg(bt, sreg);
> > > +               } else if (BPF_SRC(insn->code) == BPF_K) {
> > >                          /* else dreg <cond> K
> >
> > drop "else" from the comment then? I like this change.
>
> This is actually a leftover from v1. I can drop "else" from the
> comment or drop this hunk as it is not necessary for the series.

I'd keep explicit `else if`

>
> > >                           * Only dreg still needs precision before
> > >                           * this insn, so for the K-based conditional
> > > @@ -3862,6 +3993,10 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
> > >                         /* to be analyzed */
> > >                         return -ENOTSUPP;
> > >         }
> > > +       /* Propagate precision marks to linked registers, to account for
> > > +        * registers marked as precise in this function.
> > > +        */
> > > +       bt_sync_linked_regs(bt, hist);
> >
> > Radical Andrii is fine with this, though I wonder if there is some
> > place outside of backtrack_insn() where the first
> > bt_sync_linked_regs() could be called just once?
>
> The problem here is that:
> - in theory linked_regs could be present for any instruction, thus
>   sync() is needed after get_jmp_hist_entry call in
>   __mark_chain_precision();
> - backtrack_insn() might both remove and add some registers in bt,
>   hence, to correctly handle bt_empty() call in __mark_chain_precision
>   the sync() is also needed after backtrack_insn().
>
> So, current placement is the simplest I could come up with.

agreed, let's keep it as is

[...]

> > > @@ -15312,6 +15500,21 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
> > >                 return 0;
> > >         }
> > >
> > > +       /* Push scalar registers sharing same ID to jump history,
> > > +        * do this before creating 'other_branch', so that both
> > > +        * 'this_branch' and 'other_branch' share this history
> > > +        * if parent state is created.
> > > +        */
> > > +       if (BPF_SRC(insn->code) == BPF_X && src_reg->type == SCALAR_VALUE && src_reg->id)
> > > +               find_equal_scalars(this_branch, src_reg->id, &linked_regs);
> > > +       if (dst_reg->type == SCALAR_VALUE && dst_reg->id)
> > > +               find_equal_scalars(this_branch, dst_reg->id, &linked_regs);
> > > +       if (linked_regs.cnt > 1) {
> >
> > if we have just one, should it be even marked as linked?
>
> Sorry, I don't understand. Do you suggest to add an additional check
> in find_equal_scalars/collect_linked_regs and reset it if 'cnt' equals 1?

I find `if (linked_regs.cnt > 1)` check a bit weird and it feels like
it should be unnecessary. As soon as we are left with just one
"linked" register (linked with what? with itself?) it shouldn't be
linked anymore. Is there a point where we break the link between
registers where we can/should drop ID from the singularly linked
register? Why keep that scalar register ID set?

>
> [...]
> >
> > > +               err = push_jmp_history(env, this_branch, 0, linked_regs_pack(&linked_regs));
> > > +               if (err)
> > > +                       return err;
> > > +       }
> > > +
> > >         other_branch = push_stack(env, *insn_idx + insn->off + 1, *insn_idx,
> > >                                   false);
> > >         if (!other_branch)
> > > @@ -15336,13 +15539,13 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
> > >         if (BPF_SRC(insn->code) == BPF_X &&
> > >             src_reg->type == SCALAR_VALUE && src_reg->id &&
> > >             !WARN_ON_ONCE(src_reg->id != other_branch_regs[insn->src_reg].id)) {
> > > -               find_equal_scalars(this_branch, src_reg);
> > > -               find_equal_scalars(other_branch, &other_branch_regs[insn->src_reg]);
> > > +               copy_known_reg(this_branch, src_reg, &linked_regs);
> > > +               copy_known_reg(other_branch, &other_branch_regs[insn->src_reg], &linked_regs);
> >
> > I liked the "sync" terminology you used for bt, so why not call this
> > "sync_linked_regs" ?
>
> I kept the current name for the function.
> Suggested name makes sense, though.
>
> [...]