Re: [PATCH v1 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire

Dave Marchevsky <davemarchevsky@xxxxxxxxx> · Wed, 2 Aug 2023 15:23:06 -0400

On 8/1/23 11:57 PM, Yonghong Song wrote:
> 
> 
> On 8/1/23 1:36 PM, Dave Marchevsky wrote:
>> It's straightforward to prove that kptr_struct_meta must be non-NULL for
>> any valid call to these kfuncs:
>>
>>    * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any
>>      struct in user BTF with a special field (e.g. bpf_refcount,
>>      {rb,list}_node). These are stored in that BTF's struct_meta_tab.
>>
>>    * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes
>>      have {rb,list}_node field and that it's at the correct offset.
>>      Similarly, check_kfunc_args ensures bpf_refcount field existence for
>>      node param to bpf_refcount_acquire.
>>
>>    * So a btf_struct_meta must have been created for the struct type of
>>      node param to these kfuncs
>>
>>    * That BTF and its struct_meta_tab are guaranteed to still be around.
>>      Any arbitrary {rb,list} node the BPF program interacts with either:
>>      came from bpf_obj_new or a collection removal kfunc in the same
>>      program, in which case the BTF is associated with the program and
>>      still around; or came from bpf_kptr_xchg, in which case the BTF was
>>      associated with the map and is still around
>>
>> Instead of silently continuing with NULL struct_meta, which caused
>> confusing bugs such as those addressed by commit 2140a6e3422d ("bpf: Set
>> kptr_struct_meta for node param to list and rbtree insert funcs"), let's
>> error out. Then, at runtime, we can confidently say that the
>> implementations of these kfuncs were given a non-NULL kptr_struct_meta,
>> meaning that special-field-specific functionality like
>> bpf_obj_free_fields and the bpf_obj_drop change introduced later in this
>> series are guaranteed to execute.
> 
> The subject says '... for collection insert and refcount_acquire'.
> Why picks these? We could check for all kptr_struct_meta use cases?
> 

fixup_kfunc_call sets kptr_struct_meta arg for the following kfuncs:

  - bpf_obj_new_impl
  - bpf_obj_drop_impl
  - collection insert kfuncs
    - bpf_rbtree_add_impl
    - bpf_list_push_{front,back}_impl
  - bpf_refcount_acquire_impl

A btf_struct_meta is only created for a struct if it has a non-null btf_record,
which in turn only happens if the struct has any special fields (spin_lock,
refcount, {rb,list}_node, etc.). Since it's valid to call bpf_obj_new on a
struct type without any special fields, the kptr_struct_meta arg can be
NULL. The result of such bpf_obj_new allocation must be bpf_obj_drop-able, so
the same holds for that kfunc.

By definition rbtree and list nodes must be some struct type w/
struct bpf_{rb,list}_node field, and similar logic for refcounted, so if there's
no kptr_struct_meta for their node arg, there was some verifier-internal issue.

>>
>> This patch doesn't change functionality, just makes it easier to reason
>> about existing functionality.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@xxxxxx>
>> ---
>>   kernel/bpf/verifier.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index e7b1af016841..ec37e84a11c6 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -18271,6 +18271,13 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>           struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>>           struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
>>   +        if (desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
> 
> Why check for KF_bpf_refcount_acquire_impl? We can cover all cases in this 'if' branch, right?
> 

The body of this 'else if' also handles kptr_struct_meta setup for bpf_obj_drop,
for which NULL kptr_struct_meta is valid. 

>> +            !kptr_struct_meta) {
>> +            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
>> +                insn_idx);
>> +            return -EFAULT;
>> +        }
>> +
>>           insn_buf[0] = addr[0];
>>           insn_buf[1] = addr[1];
>>           insn_buf[2] = *insn;
>> @@ -18278,6 +18285,7 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>       } else if (desc->func_id == special_kfunc_list[KF_bpf_list_push_back_impl] ||
>>              desc->func_id == special_kfunc_list[KF_bpf_list_push_front_impl] ||
>>              desc->func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) {
>> +        struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>>           int struct_meta_reg = BPF_REG_3;
>>           int node_offset_reg = BPF_REG_4;
>>   @@ -18287,6 +18295,12 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>               node_offset_reg = BPF_REG_5;
>>           }
>>   +        if (!kptr_struct_meta) {
>> +            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
>> +                insn_idx);
>> +            return -EFAULT;
>> +        }
>> +
>>           __fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg,
>>                           node_offset_reg, insn, insn_buf, cnt);
>>       } else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||
> 
> In my opinion, such selective defensive programming is not necessary. By searching kptr_struct_meta in the code, it is reasonably easy to find
> whether we have any mismatch or not. Also self test coverage should
> cover these cases (probably already) right?
> 
> If the defensive programming here is still desirable to warn at verification time, I think we should just check all of uses for kptr_struct_meta.

Something like this patch probably should've been included with the series
containing 2140a6e3422d ("bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs"),
since that commit found that kptr_struct_meta wasn't being set for collection
insert kfuncs and fixed the issue. It was annoyingly hard to root-cause
because, among other things, many of these kfunc impls check that
the btf_struct_meta is non-NULL before using it, with some fallback logic.
I don't like those unnecessary NULL checks either, and considered removing
them in this patch, but decided to leave them in since we already had
a case where struct_meta wasn't being set.

On second thought, maybe it's better to take the unnecessary runtime checks
out and leave these verification-time checks in. If, at runtime, those kfuncs
see a NULL btf_struct_meta, I'd rather they fail loudly in the future
with a NULL deref splat, than potentially leaking memory or similarly
subtle failures. WDYT?

I don't feel particularly strongly about these verification-time checks,
but the level of 'selective defensive programming' here feels similar to
other 'verifier internal error' checks sprinkled throughout verifier.c,
so that argument doesn't feel very persuasive to me.