Re: [PATCH v1 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire

Yonghong Song <yonghong.song@xxxxxxxxx> · Wed, 2 Aug 2023 14:41:17 -0700

On 8/2/23 12:23 PM, Dave Marchevsky wrote:


On 8/1/23 11:57 PM, Yonghong Song wrote:


On 8/1/23 1:36 PM, Dave Marchevsky wrote:
It's straightforward to prove that kptr_struct_meta must be non-NULL for
any valid call to these kfuncs:

    * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any
      struct in user BTF with a special field (e.g. bpf_refcount,
      {rb,list}_node). These are stored in that BTF's struct_meta_tab.

    * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes
      have {rb,list}_node field and that it's at the correct offset.
      Similarly, check_kfunc_args ensures bpf_refcount field existence for
      node param to bpf_refcount_acquire.

    * So a btf_struct_meta must have been created for the struct type of
      node param to these kfuncs

    * That BTF and its struct_meta_tab are guaranteed to still be around.
      Any arbitrary {rb,list} node the BPF program interacts with either:
      came from bpf_obj_new or a collection removal kfunc in the same
      program, in which case the BTF is associated with the program and
      still around; or came from bpf_kptr_xchg, in which case the BTF was
      associated with the map and is still around

Instead of silently continuing with NULL struct_meta, which caused
confusing bugs such as those addressed by commit 2140a6e3422d ("bpf: Set
kptr_struct_meta for node param to list and rbtree insert funcs"), let's
error out. Then, at runtime, we can confidently say that the
implementations of these kfuncs were given a non-NULL kptr_struct_meta,
meaning that special-field-specific functionality like
bpf_obj_free_fields and the bpf_obj_drop change introduced later in this
series are guaranteed to execute.

The subject says '... for collection insert and refcount_acquire'.
Why picks these? We could check for all kptr_struct_meta use cases?


fixup_kfunc_call sets kptr_struct_meta arg for the following kfuncs:

   - bpf_obj_new_impl
   - bpf_obj_drop_impl
   - collection insert kfuncs
     - bpf_rbtree_add_impl
     - bpf_list_push_{front,back}_impl
   - bpf_refcount_acquire_impl

A btf_struct_meta is only created for a struct if it has a non-null btf_record,
which in turn only happens if the struct has any special fields (spin_lock,
refcount, {rb,list}_node, etc.). Since it's valid to call bpf_obj_new on a
struct type without any special fields, the kptr_struct_meta arg can be
NULL. The result of such bpf_obj_new allocation must be bpf_obj_drop-able, so
the same holds for that kfunc.

By definition rbtree and list nodes must be some struct type w/
struct bpf_{rb,list}_node field, and similar logic for refcounted, so if there's
no kptr_struct_meta for their node arg, there was some verifier-internal issue.



This patch doesn't change functionality, just makes it easier to reason
about existing functionality.

Signed-off-by: Dave Marchevsky <davemarchevsky@xxxxxx>
---
   kernel/bpf/verifier.c | 14 ++++++++++++++
   1 file changed, 14 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e7b1af016841..ec37e84a11c6 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18271,6 +18271,13 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
           struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
           struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
   +        if (desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&

Why check for KF_bpf_refcount_acquire_impl? We can cover all cases in this 'if' branch, right?


The body of this 'else if' also handles kptr_struct_meta setup for bpf_obj_drop,
for which NULL kptr_struct_meta is valid.

+            !kptr_struct_meta) {
+            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
+                insn_idx);
+            return -EFAULT;
+        }
+
           insn_buf[0] = addr[0];
           insn_buf[1] = addr[1];
           insn_buf[2] = *insn;
@@ -18278,6 +18285,7 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
       } else if (desc->func_id == special_kfunc_list[KF_bpf_list_push_back_impl] ||
              desc->func_id == special_kfunc_list[KF_bpf_list_push_front_impl] ||
              desc->func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) {
+        struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
           int struct_meta_reg = BPF_REG_3;
           int node_offset_reg = BPF_REG_4;
   @@ -18287,6 +18295,12 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
               node_offset_reg = BPF_REG_5;
           }
   +        if (!kptr_struct_meta) {
+            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
+                insn_idx);
+            return -EFAULT;
+        }
+
           __fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg,
                           node_offset_reg, insn, insn_buf, cnt);
       } else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||

In my opinion, such selective defensive programming is not necessary. By searching kptr_struct_meta in the code, it is reasonably easy to find
whether we have any mismatch or not. Also self test coverage should
cover these cases (probably already) right?

If the defensive programming here is still desirable to warn at verification time, I think we should just check all of uses for kptr_struct_meta.

Something like this patch probably should've been included with the series
containing 2140a6e3422d ("bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs"),
since that commit found that kptr_struct_meta wasn't being set for collection
insert kfuncs and fixed the issue. It was annoyingly hard to root-cause
because, among other things, many of these kfunc impls check that
the btf_struct_meta is non-NULL before using it, with some fallback logic.
I don't like those unnecessary NULL checks either, and considered removing
them in this patch, but decided to leave them in since we already had
a case where struct_meta wasn't being set.

On second thought, maybe it's better to take the unnecessary runtime checks
out and leave these verification-time checks in. If, at runtime, those kfuncs
see a NULL btf_struct_meta, I'd rather they fail loudly in the future
with a NULL deref splat, than potentially leaking memory or similarly
subtle failures. WDYT?

Certainly I agree with you that verification failure is much better than
debugging runtime.

Here, we covered a few kfunc which always requires non-NULL 
kptr_struct_meta. But as you mentioned in the above, we also have
cases where for a kfunc, the kptr_struct_meta could be NULL or non-NULL.

Let us say, kfunc bpf_obj_new_impl, for some cases, the kptr_struct_meta
cannot be NULL based on bpf prog, but somehow, the verifier passes
a NULL ptr to the program. Should we check this at fixup_kfunc_call()
as well?


I don't feel particularly strongly about these verification-time checks,
but the level of 'selective defensive programming' here feels similar to
other 'verifier internal error' checks sprinkled throughout verifier.c,
so that argument doesn't feel very persuasive to me.

I am okay with this patch but I wonder whether we can cover more
cases.