On Wed, Oct 2, 2024 at 5:12 PM Daniel Xu <dxu@xxxxxxxxx> wrote: > > This commit allows progs to elide a null check on statically known map > lookup keys. In other words, if the verifier can statically prove that > the lookup will be in-bounds, allow the prog to drop the null check. > > This is useful for two reasons: > > 1. Large numbers of nullness checks (especially when they cannot fail) > unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ. > 2. It forms a tighter contract between programmer and verifier. > > For (1), bpftrace is starting to make heavier use of percpu scratch > maps. As a result, for user scripts with large number of unrolled loops, > we are starting to hit jump complexity verification errors. These > percpu lookups cannot fail anyways, as we only use static key values. > Eliding nullness probably results in less work for verifier as well. > > For (2), percpu scratch maps are often used as a larger stack, as the > currrent stack is limited to 512 bytes. In these situations, it is > desirable for the programmer to express: "this lookup should never fail, > and if it does, it means I messed up the code". By omitting the null > check, the programmer can "ask" the verifier to double check the logic. > > Tests also have to be updated in sync with these changes, as the > verifier is more efficient with this change. Notable, iters.c tests had > to be changed to use a map type that still requires null checks, as it's > exercising verifier tracking logic w.r.t iterators. > > Acked-by: Eduard Zingerman <eddyz87@xxxxxxxxx> > Signed-off-by: Daniel Xu <dxu@xxxxxxxxx> > --- > kernel/bpf/verifier.c | 73 ++++++++++++++++++- > tools/testing/selftests/bpf/progs/iters.c | 14 ++-- > .../selftests/bpf/progs/map_kptr_fail.c | 2 +- > .../selftests/bpf/progs/verifier_map_in_map.c | 2 +- > .../testing/selftests/bpf/verifier/map_kptr.c | 2 +- > 5 files changed, 82 insertions(+), 11 deletions(-) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 7d9b38ffd220..9ee2cd6c0508 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -284,6 +284,7 @@ struct bpf_call_arg_meta { > u32 ret_btf_id; > u32 subprogno; > struct btf_field *kptr_field; > + long const_map_key; key might collide with special -1 on 32-bit archs ? s64 instead ? > }; > > struct bpf_kfunc_call_arg_meta { > @@ -10416,6 +10417,60 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno > state->callback_subprogno == subprogno); > } > > +/* Returns whether or not the given map type can potentially elide > + * lookup return value nullness check. This is possible if the key > + * is statically known. > + */ > +static bool can_elide_value_nullness(enum bpf_map_type type) > +{ > + switch (type) { > + case BPF_MAP_TYPE_ARRAY: > + case BPF_MAP_TYPE_PERCPU_ARRAY: > + return true; > + default: > + return false; > + } > +} > + > +/* Returns constant key value if possible, else -1 */ > +static long get_constant_map_key(struct bpf_verifier_env *env, > + struct bpf_reg_state *key) > +{ > + struct bpf_func_state *state = func(env, key); > + struct bpf_reg_state *reg; > + int stack_off; > + int slot; > + int spi; > + > + if (!env->bpf_capable) > + return -1; > + if (key->type != PTR_TO_STACK) > + return -1; > + if (!tnum_is_const(key->var_off)) > + return -1; > + > + stack_off = key->off + key->var_off.value; > + slot = -stack_off - 1; > + if (slot < 0) > + /* Stack grew upwards. This is properly checked > + * as part of helper argument processing, but since > + * this runs before those checks, we need to preemptively > + * do this here. > + */ > + return -1; > + else if (slot >= state->allocated_stack) > + /* Stack uninitialized */ > + return -1; > + > + spi = slot / BPF_REG_SIZE; > + reg = &state->stack[spi].spilled_ptr; reg->type == SCALAR check is missing. slot->type should also be checked. spiller_ptr is only correct for STACK_SPILL. it should also recognize that STACK_ZERO means zero. > + if (!tnum_is_const(reg->var_off)) > + /* Stack value not statically known */ > + return -1; I suspect tnum_is_const could pass for pointers and other !scalar. > + > + return reg->var_off.value; and this will be zero by accident. Bits of check_stack_read_fixed_off() should probably be factored out and used here. > +} > + > static int get_helper_proto(struct bpf_verifier_env *env, int func_id, > const struct bpf_func_proto **ptr) > { > @@ -10513,6 +10568,15 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn > env->insn_aux_data[insn_idx].storage_get_func_atomic = true; > } > > + /* Logically we are trying to check on key register state before > + * the helper is called, so process here. Otherwise argument processing > + * may clobber the spilled key values. > + */ > + regs = cur_regs(env); > + if (func_id == BPF_FUNC_map_lookup_elem) > + meta.const_map_key = get_constant_map_key(env, ®s[BPF_REG_2]); > + > + I think the logic is too specific. Let's make it more generic. It probably better to do later in: case ARG_PTR_TO_MAP_KEY: after check_helper_mem_access() returns 0 and store into meta.const_map_key. > meta.func_id = func_id; > /* check args */ > for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) { > @@ -10773,10 +10837,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn > "kernel subsystem misconfigured verifier\n"); > return -EINVAL; > } > + > + if (func_id == BPF_FUNC_map_lookup_elem && and only here check for func_id. Saving meta.const_map_key can be unconditional for all ARG_PTR_TO_MAP_KEY. We can use it in a follow up to optimize lookup/update from arrays. Currently array_map_gen_lookup() still emits a load from key and bounds check. All that can be optimized with const_map_key. > + can_elide_value_nullness(meta.map_ptr->map_type) && > + meta.const_map_key >= 0 && > + meta.const_map_key < meta.map_ptr->max_entries) > + ret_flag &= ~PTR_MAYBE_NULL; > + > regs[BPF_REG_0].map_ptr = meta.map_ptr; > regs[BPF_REG_0].map_uid = meta.map_uid; > regs[BPF_REG_0].type = PTR_TO_MAP_VALUE | ret_flag; > - if (!type_may_be_null(ret_type) && > + if (!type_may_be_null(regs[BPF_REG_0].type) && I would use ret_flag here instead of R0.type. pw-bot: cr