Re: [PATCH bpf-next 1/3] bpf: Propagate scalar ranges through register assignments.

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Tue, 6 Oct 2020 21:15:18 -0700

On Tue, Oct 06, 2020 at 08:31:23PM -0700, Andrii Nakryiko wrote:
> 
> > 'linked' is also wrong. The regs are exactly equal.
> > In case of pkt and other pointers two regs will have the same id
> > as well, but they will not be equal. Here these two scalars are equal
> > otherwise doing *reg = *known_reg would be wrong.
> 
> Ok, I guess it also means that "reg->type == SCALAR_VALUE" checks
> below are unnecessary as well, because if known_reg->id matches, that
> means register states are exactly the same.
> > > > +               for (j = 0; j < MAX_BPF_REG; j++) {
> > > > +                       reg = &state->regs[j];
> > > > +                       if (reg->type == SCALAR_VALUE && reg->id == known_reg->id)

Right. The type check is technically unnecessary. It's a safety net in case id
assignment goes wrong plus it makes it easier to understand the logic.

> > > Even if yes, it probably would be more
> > > straightforward to call appropriate updates in the respective if
> > > branches (it's just a single line for each register, so not like it's
> > > duplicating tons of code).
> >
> > You mean inside reg_set_min_max() and inside reg_combine_min_max() ?
> > That won't work because find_equal_scalars() needs access to the whole
> > bpf_verifier_state and not just bpf_reg_state.
> 
> No, I meant something like this, few lines above:
> 
> if (BPF_SRC(insn->code) == BPF_X) {
> 
>     if (dst_reg->type == SCALAR_VALUE && src_reg->type == SCALAR_VALUE) {
>         if (...)
>         else if (...)
>         else
> 
>         /* both src/dst regs in both this/other branches could have
> been updated */
>         find_equal_scalars(this_branch, src_reg);
>         find_equal_scalars(this_branch, dst_reg);
>         find_equal_scalars(other_branch, &other_branch_regs[insn->src_reg])
>         find_equal_scalars(other_branch, &other_branch_regs[insn->dst_reg])
>     }
> } else if (dst_reg->type == SCALAR_VALUE) {
>     reg_set_min_max(...);
> 
>     /* only dst_reg in both branches could have been updated */
>     find_equal_scalars(this_branch, dst_reg);
>     find_equal_scalars(other_branch, &other_branch_regs[insn->dst_reg]);
> }
> 
> 
> This keeps find_equal_scalars() for relevant registers very close to
> places where those registers are updated, instead of jumping back and
> forth between the complicated if  after it, and double-checking under
> which circumstances dst_reg can be updated, for example.

I see it differently.
I don't like moving if (reg->id) into find_equal_scalars(). Otherwise it would
have to be named something like try_find_equal_scalars(). And even with such
"try_" prefix it's still not clean. It's my general dislike of defensive
programming. I prefer all functions to be imperative: "do" vs "try_do".
There are exception from the rule, of course. Like kfree() that accepts NULL.
That's fine.
In this case I think if (type == SCALAR && id != 0) should be done by the caller.
Note that's different from __update_reg_bounds().
There the bounds may or may not change, but the action is performed.
What you're proposing it to make find_equal_scalars() accept any kind
of register and do the action only if argument is actual scalar
and its "id != 0". That's exactly the defensive programming
that I feel make programmers sloppier.
Note that's not the same as mark_reg_unknown() doing
if (WARN_ON(regno >= MAX_BPF_REG)) check. I hope the difference is clear.