On 8/1/23 11:57 PM, Yonghong Song wrote: > > > On 8/1/23 1:36 PM, Dave Marchevsky wrote: >> It's straightforward to prove that kptr_struct_meta must be non-NULL for >> any valid call to these kfuncs: >> >> * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any >> struct in user BTF with a special field (e.g. bpf_refcount, >> {rb,list}_node). These are stored in that BTF's struct_meta_tab. >> >> * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes >> have {rb,list}_node field and that it's at the correct offset. >> Similarly, check_kfunc_args ensures bpf_refcount field existence for >> node param to bpf_refcount_acquire. >> >> * So a btf_struct_meta must have been created for the struct type of >> node param to these kfuncs >> >> * That BTF and its struct_meta_tab are guaranteed to still be around. >> Any arbitrary {rb,list} node the BPF program interacts with either: >> came from bpf_obj_new or a collection removal kfunc in the same >> program, in which case the BTF is associated with the program and >> still around; or came from bpf_kptr_xchg, in which case the BTF was >> associated with the map and is still around >> >> Instead of silently continuing with NULL struct_meta, which caused >> confusing bugs such as those addressed by commit 2140a6e3422d ("bpf: Set >> kptr_struct_meta for node param to list and rbtree insert funcs"), let's >> error out. Then, at runtime, we can confidently say that the >> implementations of these kfuncs were given a non-NULL kptr_struct_meta, >> meaning that special-field-specific functionality like >> bpf_obj_free_fields and the bpf_obj_drop change introduced later in this >> series are guaranteed to execute. > > The subject says '... for collection insert and refcount_acquire'. > Why picks these? We could check for all kptr_struct_meta use cases? > fixup_kfunc_call sets kptr_struct_meta arg for the following kfuncs: - bpf_obj_new_impl - bpf_obj_drop_impl - collection insert kfuncs - bpf_rbtree_add_impl - bpf_list_push_{front,back}_impl - bpf_refcount_acquire_impl A btf_struct_meta is only created for a struct if it has a non-null btf_record, which in turn only happens if the struct has any special fields (spin_lock, refcount, {rb,list}_node, etc.). Since it's valid to call bpf_obj_new on a struct type without any special fields, the kptr_struct_meta arg can be NULL. The result of such bpf_obj_new allocation must be bpf_obj_drop-able, so the same holds for that kfunc. By definition rbtree and list nodes must be some struct type w/ struct bpf_{rb,list}_node field, and similar logic for refcounted, so if there's no kptr_struct_meta for their node arg, there was some verifier-internal issue. >> >> This patch doesn't change functionality, just makes it easier to reason >> about existing functionality. >> >> Signed-off-by: Dave Marchevsky <davemarchevsky@xxxxxx> >> --- >> kernel/bpf/verifier.c | 14 ++++++++++++++ >> 1 file changed, 14 insertions(+) >> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >> index e7b1af016841..ec37e84a11c6 100644 >> --- a/kernel/bpf/verifier.c >> +++ b/kernel/bpf/verifier.c >> @@ -18271,6 +18271,13 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, >> struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta; >> struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) }; >> + if (desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] && > > Why check for KF_bpf_refcount_acquire_impl? We can cover all cases in this 'if' branch, right? > The body of this 'else if' also handles kptr_struct_meta setup for bpf_obj_drop, for which NULL kptr_struct_meta is valid. >> + !kptr_struct_meta) { >> + verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n", >> + insn_idx); >> + return -EFAULT; >> + } >> + >> insn_buf[0] = addr[0]; >> insn_buf[1] = addr[1]; >> insn_buf[2] = *insn; >> @@ -18278,6 +18285,7 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, >> } else if (desc->func_id == special_kfunc_list[KF_bpf_list_push_back_impl] || >> desc->func_id == special_kfunc_list[KF_bpf_list_push_front_impl] || >> desc->func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) { >> + struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta; >> int struct_meta_reg = BPF_REG_3; >> int node_offset_reg = BPF_REG_4; >> @@ -18287,6 +18295,12 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, >> node_offset_reg = BPF_REG_5; >> } >> + if (!kptr_struct_meta) { >> + verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n", >> + insn_idx); >> + return -EFAULT; >> + } >> + >> __fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg, >> node_offset_reg, insn, insn_buf, cnt); >> } else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] || > > In my opinion, such selective defensive programming is not necessary. By searching kptr_struct_meta in the code, it is reasonably easy to find > whether we have any mismatch or not. Also self test coverage should > cover these cases (probably already) right? > > If the defensive programming here is still desirable to warn at verification time, I think we should just check all of uses for kptr_struct_meta. Something like this patch probably should've been included with the series containing 2140a6e3422d ("bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs"), since that commit found that kptr_struct_meta wasn't being set for collection insert kfuncs and fixed the issue. It was annoyingly hard to root-cause because, among other things, many of these kfunc impls check that the btf_struct_meta is non-NULL before using it, with some fallback logic. I don't like those unnecessary NULL checks either, and considered removing them in this patch, but decided to leave them in since we already had a case where struct_meta wasn't being set. On second thought, maybe it's better to take the unnecessary runtime checks out and leave these verification-time checks in. If, at runtime, those kfuncs see a NULL btf_struct_meta, I'd rather they fail loudly in the future with a NULL deref splat, than potentially leaking memory or similarly subtle failures. WDYT? I don't feel particularly strongly about these verification-time checks, but the level of 'selective defensive programming' here feels similar to other 'verifier internal error' checks sprinkled throughout verifier.c, so that argument doesn't feel very persuasive to me.