Re: [PATCH bpf-next v7 17/26] bpf: Introduce bpf_obj_new

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Tue, 15 Nov 2022 22:23:41 +0530

On Tue, Nov 15, 2022 at 11:49:12AM IST, Alexei Starovoitov wrote:
> On Tue, Nov 15, 2022 at 12:45:38AM +0530, Kumar Kartikeya Dwivedi wrote:
> > Introduce type safe memory allocator bpf_obj_new for BPF programs. The
> > kernel side kfunc is named bpf_obj_new_impl, as passing hidden arguments
> > to kfuncs still requires having them in prototype, unlike BPF helpers
> > which always take 5 arguments and have them checked using bpf_func_proto
> > in verifier, ignoring unset argument types.
> >
> > Introduce __ign suffix to ignore a specific kfunc argument during type
> > checks, then use this to introduce support for passing type metadata to
> > the bpf_obj_new_impl kfunc.
> >
> > The user passes BTF ID of the type it wants to allocates in program BTF,
> > the verifier then rewrites the first argument as the size of this type,
> > after performing some sanity checks (to ensure it exists and it is a
> > struct type).
> >
> > The second argument is also fixed up and passed by the verifier. This is
> > the btf_struct_meta for the type being allocated. It would be needed
> > mostly for the offset array which is required for zero initializing
> > special fields while leaving the rest of storage in unitialized state.
> >
> > It would also be needed in the next patch to perform proper destruction
> > of the object's special fields.
> >
> > Under the hood, bpf_obj_new will call bpf_mem_alloc and bpf_mem_free,
> > using the any context BPF memory allocator introduced recently. To this
> > end, a global instance of the BPF memory allocator is initialized on
> > boot to be used for this purpose. This 'bpf_global_ma' serves all
> > allocations for bpf_obj_new. In the future, bpf_obj_new variants will
> > allow specifying a custom allocator.
> >
> > Note that now that bpf_obj_new can be used to allocate objects that can
> > be linked to BPF linked list (when future linked list helpers are
> > available), we need to also free the elements using bpf_mem_free.
> > However, since the draining of elements is done outside the
> > bpf_spin_lock, we need to do migrate_disable around the call since
> > bpf_list_head_free can be called from map free path where migration is
> > enabled. Otherwise, when called from BPF programs migration is already
> > disabled.
> >
> > A convenience macro is included in the bpf_experimental.h header to hide
> > over the ugly details of the implementation, leading to user code
> > looking similar to a language level extension which allocates and
> > constructs fields of a user type.
> >
> > struct bar {
> > 	struct bpf_list_node node;
> > };
> >
> > struct foo {
> > 	struct bpf_spin_lock lock;
> > 	struct bpf_list_head head __contains(bar, node);
> > };
> >
> > void prog(void) {
> > 	struct foo *f;
> >
> > 	f = bpf_obj_new(typeof(*f));
> > 	if (!f)
> > 		return;
> > 	...
> > }
> >
> > A key piece of this story is still missing, i.e. the free function,
> > which will come in the next patch.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx>
> > ---
> >  include/linux/bpf.h                           |  21 ++--
> >  include/linux/bpf_verifier.h                  |   2 +
> >  kernel/bpf/core.c                             |  16 +++
> >  kernel/bpf/helpers.c                          |  47 ++++++--
> >  kernel/bpf/verifier.c                         | 107 ++++++++++++++++--
> >  .../testing/selftests/bpf/bpf_experimental.h  |  25 ++++
> >  6 files changed, 195 insertions(+), 23 deletions(-)
> >  create mode 100644 tools/testing/selftests/bpf/bpf_experimental.h
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 62a16b699e71..4635e31bd6fc 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -54,6 +54,8 @@ struct cgroup;
> >  extern struct idr btf_idr;
> >  extern spinlock_t btf_idr_lock;
> >  extern struct kobject *btf_kobj;
> > +extern struct bpf_mem_alloc bpf_global_ma;
> > +extern bool bpf_global_ma_set;
> >
> >  typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
> >  typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
> > @@ -333,16 +335,19 @@ static inline bool btf_record_has_field(const struct btf_record *rec, enum btf_f
> >  	return rec->field_mask & type;
> >  }
> >
> > -static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
> > +static inline void bpf_obj_init(const struct btf_field_offs *foffs, void *obj)
> >  {
> > -	if (!IS_ERR_OR_NULL(map->record)) {
> > -		struct btf_field *fields = map->record->fields;
> > -		u32 cnt = map->record->cnt;
> > -		int i;
> > +	int i;
> >
> > -		for (i = 0; i < cnt; i++)
> > -			memset(dst + fields[i].offset, 0, btf_field_type_size(fields[i].type));
> > -	}
> > +	if (!foffs)
> > +		return;
> > +	for (i = 0; i < foffs->cnt; i++)
> > +		memset(obj + foffs->field_off[i], 0, foffs->field_sz[i]);
> > +}
> > +
> > +static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
> > +{
> > +	bpf_obj_init(map->field_offs, dst);
> >  }
> >
> >  /* memcpy that is used with 8-byte aligned pointers, power-of-8 size and
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 887fa4d922f6..306fc1d6cc4a 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -427,6 +427,8 @@ struct bpf_insn_aux_data {
> >  		 */
> >  		struct bpf_loop_inline_state loop_inline_state;
> >  	};
> > +	u64 obj_new_size; /* remember the size of type passed to bpf_obj_new to rewrite R1 */
> > +	struct btf_struct_meta *kptr_struct_meta;
> >  	u64 map_key_state; /* constant (32 bit) key tracking for maps */
> >  	int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
> >  	u32 seen; /* this insn was processed by the verifier at env->pass_cnt */
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 9c16338bcbe8..2e57fc839a5c 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -34,6 +34,7 @@
> >  #include <linux/log2.h>
> >  #include <linux/bpf_verifier.h>
> >  #include <linux/nodemask.h>
> > +#include <linux/bpf_mem_alloc.h>
> >
> >  #include <asm/barrier.h>
> >  #include <asm/unaligned.h>
> > @@ -60,6 +61,9 @@
> >  #define CTX	regs[BPF_REG_CTX]
> >  #define IMM	insn->imm
> >
> > +struct bpf_mem_alloc bpf_global_ma;
> > +bool bpf_global_ma_set;
> > +
> >  /* No hurry in this branch
> >   *
> >   * Exported for the bpf jit load helper.
> > @@ -2746,6 +2750,18 @@ int __weak bpf_arch_text_invalidate(void *dst, size_t len)
> >  	return -ENOTSUPP;
> >  }
> >
> > +#ifdef CONFIG_BPF_SYSCALL
> > +static int __init bpf_global_ma_init(void)
> > +{
> > +	int ret;
> > +
> > +	ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false);
> > +	bpf_global_ma_set = !ret;
> > +	return ret;
> > +}
> > +late_initcall(bpf_global_ma_init);
> > +#endif
> > +
> >  DEFINE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
> >  EXPORT_SYMBOL(bpf_stats_enabled_key);
> >
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 5bc0b9f0f306..c4f1c22cc44c 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -19,6 +19,7 @@
> >  #include <linux/proc_ns.h>
> >  #include <linux/security.h>
> >  #include <linux/btf_ids.h>
> > +#include <linux/bpf_mem_alloc.h>
> >
> >  #include "../../lib/kstrtox.h"
> >
> > @@ -1735,25 +1736,57 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
> >
> >  		obj -= field->list_head.node_offset;
> >  		head = head->next;
> > -		/* TODO: Rework later */
> > -		kfree(obj);
> > +		/* The contained type can also have resources, including a
> > +		 * bpf_list_head which needs to be freed.
> > +		 */
> > +		bpf_obj_free_fields(field->list_head.value_rec, obj);
> > +		/* bpf_mem_free requires migrate_disable(), since we can be
> > +		 * called from map free path as well apart from BPF program (as
> > +		 * part of map ops doing bpf_obj_free_fields).
> > +		 */
> > +		migrate_disable();
> > +		bpf_mem_free(&bpf_global_ma, obj);
> > +		migrate_enable();
> >  	}
> >  }
> >
> > -BTF_SET8_START(tracing_btf_ids)
> > +__diag_push();
> > +__diag_ignore_all("-Wmissing-prototypes",
> > +		  "Global functions as their definitions will be in vmlinux BTF");
> > +
> > +void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign)
> > +{
> > +	struct btf_struct_meta *meta = meta__ign;
> > +	u64 size = local_type_id__k;
> > +	void *p;
> > +
> > +	if (unlikely(!bpf_global_ma_set))
> > +		return NULL;
> > +	p = bpf_mem_alloc(&bpf_global_ma, size);
> > +	if (!p)
> > +		return NULL;
> > +	if (meta)
> > +		bpf_obj_init(meta->field_offs, p);
> > +	return p;
> > +}
> > +
> > +__diag_pop();
> > +
> > +BTF_SET8_START(generic_btf_ids)
> >  #ifdef CONFIG_KEXEC_CORE
> >  BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
> >  #endif
> > -BTF_SET8_END(tracing_btf_ids)
> > +BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
> > +BTF_SET8_END(generic_btf_ids)
> >
> > -static const struct btf_kfunc_id_set tracing_kfunc_set = {
> > +static const struct btf_kfunc_id_set generic_kfunc_set = {
> >  	.owner = THIS_MODULE,
> > -	.set   = &tracing_btf_ids,
> > +	.set   = &generic_btf_ids,
> >  };
> >
> >  static int __init kfunc_init(void)
> >  {
> > -	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &tracing_kfunc_set);
> > +	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &generic_kfunc_set);
> >  }
> >
> >  late_initcall(kfunc_init);
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index a4a1424b19a5..c7f5d83783db 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -7948,6 +7948,11 @@ static bool is_kfunc_arg_sfx_constant(const struct btf *btf, const struct btf_pa
> >  	return __kfunc_param_match_suffix(btf, arg, "__k");
> >  }
> >
> > +static bool is_kfunc_arg_sfx_ignore(const struct btf *btf, const struct btf_param *arg)
> > +{
> > +	return __kfunc_param_match_suffix(btf, arg, "__ign");
> > +}
> > +
> >  static bool is_kfunc_arg_ret_buf_size(const struct btf *btf,
> >  				      const struct btf_param *arg,
> >  				      const struct bpf_reg_state *reg,
> > @@ -8216,6 +8221,10 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> >  		int kf_arg_type;
> >
> >  		t = btf_type_skip_modifiers(btf, args[i].type, NULL);
> > +
> > +		if (is_kfunc_arg_sfx_ignore(btf, &args[i]))
> > +			continue;
> > +
> >  		if (btf_type_is_scalar(t)) {
> >  			if (reg->type != SCALAR_VALUE) {
> >  				verbose(env, "R%d is not a scalar\n", regno);
> > @@ -8395,6 +8404,17 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> >  	return 0;
> >  }
> >
> > +enum special_kfunc_type {
> > +	KF_bpf_obj_new_impl,
> > +};
> > +
> > +BTF_SET_START(special_kfunc_set)
> > +BTF_ID(func, bpf_obj_new_impl)
> > +BTF_SET_END(special_kfunc_set)
> > +
> > +BTF_ID_LIST(special_kfunc_list)
> > +BTF_ID(func, bpf_obj_new_impl)
> > +
> >  static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >  			    int *insn_idx_p)
> >  {
> > @@ -8469,17 +8489,64 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> >  	t = btf_type_skip_modifiers(desc_btf, func_proto->type, NULL);
> >
> >  	if (is_kfunc_acquire(&meta) && !btf_type_is_struct_ptr(meta.btf, t)) {
> > -		verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n");
> > -		return -EINVAL;
> > +		/* Only exception is bpf_obj_new_impl */
> > +		if (meta.btf != btf_vmlinux || meta.func_id != special_kfunc_list[KF_bpf_obj_new_impl]) {
> > +			verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n");
> > +			return -EINVAL;
> > +		}
> >  	}
> >
> >  	if (btf_type_is_scalar(t)) {
> >  		mark_reg_unknown(env, regs, BPF_REG_0);
> >  		mark_btf_func_reg_size(env, BPF_REG_0, t->size);
> >  	} else if (btf_type_is_ptr(t)) {
> > -		ptr_type = btf_type_skip_modifiers(desc_btf, t->type,
> > -						   &ptr_type_id);
> > -		if (!btf_type_is_struct(ptr_type)) {
> > +		ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
> > +
> > +		if (meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)) {
> > +			if (!btf_type_is_void(ptr_type)) {
> > +				verbose(env, "kernel function %s must have void * return type\n",
> > +					meta.func_name);
> > +				return -EINVAL;
> > +			}
>
> Here you're checking that void *bpf_obj_new_impl and obj_drop actually were
> declared with 'void *' return ?
> and wasting run-time cycle to check that??
> Please don't.
>
> Even if we miss that during code review there is no safety issue.
>

Right, I think the later linked list patch actually drops this hunk anyway. We
fail if any of the other cases are unhandled anyway.