Re: [PATCH bpf-next v5 4/5] bpf: verifier: Support eliding map lookup nullness

Daniel Xu <dxu@xxxxxxxxx> · Fri, 13 Dec 2024 19:44:55 -0700

On Fri, Dec 13, 2024 at 03:02:11PM GMT, Andrii Nakryiko wrote:
> On Thu, Dec 12, 2024 at 3:23 PM Daniel Xu <dxu@xxxxxxxxx> wrote:
> >
> > This commit allows progs to elide a null check on statically known map
> > lookup keys. In other words, if the verifier can statically prove that
> > the lookup will be in-bounds, allow the prog to drop the null check.
> >
> > This is useful for two reasons:
> >
> > 1. Large numbers of nullness checks (especially when they cannot fail)
> >    unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ.
> > 2. It forms a tighter contract between programmer and verifier.
> >
> > For (1), bpftrace is starting to make heavier use of percpu scratch
> > maps. As a result, for user scripts with large number of unrolled loops,
> > we are starting to hit jump complexity verification errors.  These
> > percpu lookups cannot fail anyways, as we only use static key values.
> > Eliding nullness probably results in less work for verifier as well.
> >
> > For (2), percpu scratch maps are often used as a larger stack, as the
> > currrent stack is limited to 512 bytes. In these situations, it is
> > desirable for the programmer to express: "this lookup should never fail,
> > and if it does, it means I messed up the code". By omitting the null
> > check, the programmer can "ask" the verifier to double check the logic.
> >
> > Tests also have to be updated in sync with these changes, as the
> > verifier is more efficient with this change. Notable, iters.c tests had
> > to be changed to use a map type that still requires null checks, as it's
> > exercising verifier tracking logic w.r.t iterators.
> >
> > Signed-off-by: Daniel Xu <dxu@xxxxxxxxx>
> > ---
> >  kernel/bpf/verifier.c                         | 80 ++++++++++++++++++-
> >  tools/testing/selftests/bpf/progs/iters.c     | 14 ++--
> >  .../selftests/bpf/progs/map_kptr_fail.c       |  2 +-
> >  .../selftests/bpf/progs/verifier_map_in_map.c |  2 +-
> >  .../testing/selftests/bpf/verifier/map_kptr.c |  2 +-
> >  5 files changed, 87 insertions(+), 13 deletions(-)
> >
> 
> Eduard has great points. I've added a few more comments below.
> 
> pw-bot: cr
> 
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 58b36cc96bd5..4947ef884a18 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -287,6 +287,7 @@ struct bpf_call_arg_meta {
> >         u32 ret_btf_id;
> >         u32 subprogno;
> >         struct btf_field *kptr_field;
> > +       s64 const_map_key;
> >  };
> >
> >  struct bpf_kfunc_call_arg_meta {
> > @@ -9163,6 +9164,53 @@ static int check_reg_const_str(struct bpf_verifier_env *env,
> >         return 0;
> >  }
> >
> > +/* Returns constant key value if possible, else -1 */
> > +static s64 get_constant_map_key(struct bpf_verifier_env *env,
> > +                               struct bpf_reg_state *key,
> > +                               u32 key_size)
> > +{
> > +       struct bpf_func_state *state = func(env, key);
> > +       struct bpf_reg_state *reg;
> > +       int zero_size = 0;
> > +       int stack_off;
> > +       u8 *stype;
> > +       int slot;
> > +       int spi;
> > +       int i;
> > +
> > +       if (!env->bpf_capable)
> > +               return -1;
> > +       if (key->type != PTR_TO_STACK)
> > +               return -1;
> > +       if (!tnum_is_const(key->var_off))
> > +               return -1;
> > +
> > +       stack_off = key->off + key->var_off.value;
> > +       slot = -stack_off - 1;
> > +       spi = slot / BPF_REG_SIZE;
> > +
> > +       /* First handle precisely tracked STACK_ZERO, up to BPF_REG_SIZE */
> > +       stype = state->stack[spi].slot_type;
> > +       for (i = 0; i < BPF_REG_SIZE && stype[i] == STACK_ZERO; i++)
> 
> it's Friday and I'm lazy, but please double-check that this works for
> both big-endian and little-endian :)

Any tips? Are the existing tests running thru s390x hosts in CI
sufficient or should I add some tests writen in C (and not BPF
assembler)? I can never think about endianness correctly...

> 
> with Eduard's suggestion this also becomes interesting when you have
> 000mmm mix (as one example), because that gives you a small range, and
> all values might be valid keys for arrays

Can you define what "small range" means? What range is there with 0's?
Any pointers would be helpful.

> 
> > +               zero_size++;
> > +       if (zero_size == key_size)
> > +               return 0;
> > +
> > +       if (!is_spilled_reg(&state->stack[spi]))
> > +               /* Not pointer to stack */
> 
> !is_spilled_reg and "Not pointer to stack" seem to be not exactly the
> same things?

You're right - comment is not helpful. I'll make the change to
use is_spilled_scalar_reg() which is probably as clear as it gets.

[..]