Re: [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first}

Dave Marchevsky <davemarchevsky@xxxxxxxx> · Fri, 10 Feb 2023 03:22:43 -0500

On 2/9/23 10:11 PM, Alexei Starovoitov wrote:
> On Thu, Feb 09, 2023 at 09:41:41AM -0800, Dave Marchevsky wrote:
>> @@ -9924,11 +9934,12 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>  				   meta.func_id == special_kfunc_list[KF_bpf_list_pop_back]) {
>>  				struct btf_field *field = meta.arg_list_head.field;
>>  
>> -				mark_reg_known_zero(env, regs, BPF_REG_0);
>> -				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
>> -				regs[BPF_REG_0].btf = field->graph_root.btf;
>> -				regs[BPF_REG_0].btf_id = field->graph_root.value_btf_id;
>> -				regs[BPF_REG_0].off = field->graph_root.node_offset;
>> +				mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root);
>> +			} else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_remove] ||
>> +				   meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
>> +				struct btf_field *field = meta.arg_rbtree_root.field;
>> +
>> +				mark_reg_graph_node(regs, BPF_REG_0, &field->graph_root);
>>  			} else if (meta.func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) {
>>  				mark_reg_known_zero(env, regs, BPF_REG_0);
>>  				regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_TRUSTED;
>> @@ -9994,7 +10005,13 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>  			if (is_kfunc_ret_null(&meta))
>>  				regs[BPF_REG_0].id = id;
>>  			regs[BPF_REG_0].ref_obj_id = id;
>> +		} else if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_first]) {
>> +			ref_set_non_owning_lock(env, &regs[BPF_REG_0]);
>>  		}
> 
> Looking at the above code where R0 state is set across two different if-s
> it feels that bool non_owning_ref_lock from patch 2 shouldn't be a bool.

Re: "set across two different if-s" - I see what you mean, and the fact that
both are doing 'meta.func_id == whatever' checks doesn't make it clear why
they're separate. But note that above the else if that the second check
is adding is "if (is_kfunc_acquire(&meta))" check, acquire_reference_state, etc.

"Is function acquire" is a function-level property and, as the kfunc flags I
tried to add in previous versions of this series indicate, I think that
"returns a non-owning reference" and "need to invalidate non-owning refs"
are function-level properties as well.

As a contrast, the first addition - with mark_reg_graph_node - is more of a
return-type-level property. Instead of doing meta.func_id checks in that change,
we could instead assume that any kfunc returning "bpf_rb_node *" is actually
returning a graph node type w/ bpf_rb_node field. It's certainly a blurry line
in this case since it's necessary to peek at the bpf_rb_root arg in order to
provide info about the node type to mark_reg_graph_node. But this is similar
to RET_PTR_TO_MAP_VALUE logic which requires a bpf_map * to provide info about
the map value being returned.

Why does this distinction matter at all? Because I'd like to eventually merge
helper and kfunc verification as much as possible / reasonable, especially the
different approaches to func_proto-like logic. Currently, the bpf_func_proto
approach used by bpf helpers is better at expressing 
{arg,return}-type level properties. A helper func_proto can do

  .arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL | OBJ_RELEASE,

and it's obvious which arg is being released, whereas kfunc equivalent is
KF_RELEASE flag on the kfunc itself and verifier needs to assume that there's
a single arg w/ ref_obj_id which is being released. Sure, kfunc annotations
(e.g. __sz, __alloc) could be extended to support all of this, but that's
not the current state of things, and such name suffixes wouldn't work for
retval.

Similarly, current kfunc definition scheme is better at expressing function-
level properties:

  BTF_ID_FLAGS(func, whatever, KF_ACQUIRE)

There's no func_proto equivalent, the is_acquire_function helper used in
check_helper_call resorts to "func_id ==" checks. For acquire specifically
it could be faked with a OBJ_ACQUIRE flag on retval in the proto, but I
don't know if the same would make sense for "need to invalidate non-owning
refs" or something like KF_TRUSTED_ARGS.

Anyways, this was a long-winded way of saying that separating this logic across
two different if-s was intentional and will help with future refactoring.

> Patch 7 also has this split initialization of the reg state.
> First it does mark_reg_graph_node() which sets regs[regno].type = PTR_TO_BTF_ID | MEM_ALLOC
> and then it does ref_set_non_owning_lock() that sets that bool flag.
> Setting PTR_TO_BTF_ID | MEM_ALLOC in the helper without setting ref_obj_id > 0
> at the same time feels error prone.

It's unfortunate that the reg type isn't really complete for rbtree_first
until after the second chunk of code, but this was already happening with
bpf_list_pop_{front,back}, which rely on KF_ACQUIRE flag and
is_kfunc_acquire check to set ref_obj_id on the popped owning reference.

Maybe to assuage your 'error prone' concern some check can be added at
the end of check_kfunc_call which ensures that PTR_TO_BTF_ID | MEM_ALLOC
types are properly configured, and dies with 'verifier internal error'
if not. I'm not convinced it's necessary, but regardless it would be
similar to commit 47e34cb74d37 ("bpf: Add verifier check for BPF_PTR_POISON retval and arg")
which I added a few months ago.

> This non_owning_ref_lock bool flag is actually a just flag.
> I think it would be cleaner to make it similar to MEM_ALLOC and call it
> NON_OWN_REF = BIT(14 + BPF_BASE_TYPE_BITS).
> 
> Then we can set it at once in this patch and in patch 7 and avoid this split init.
> The check in patch 2 also will become cleaner.
> Instead of:
> if (type_is_ptr_alloc_obj(reg->type) && reg->non_owning_ref_lock)
> it will be
> if (reg->type == PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF)

Minor tangent: this is why I like helpers like type_is_ptr_alloc_obj - they make
it obvious what properties a reg should have to be considered a certain type by
the verifier, and provide more context as to what specific type check is
happening vs a raw check.

IMO the cleanest version of that check would be if(reg_is_non_owning_ref(reg))
with the newly-added helper doing what you'd expect.

> 
> the transition from owning to non-owning would be easier to follow as well:
> PTR_TO_BTF_ID | MEM_ALLOC with ref_obj_id > 0
>  -> 
>    PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF with ref_obj_id == 0
> 
> and it will probably help to avoid bugs where PTR_TO_BTF_ID | MEM_ALLOC is accepted
> but we forgot to check ref_obj_id. There are no such places now, but it feels
> less error prone with proper flag instead of bool.

I'm not strongly opposed to a NON_OWN_REF type flag. It's a more granular
version of the KF_RELEASE_NON_OWN flag which I tried to add in a previous
version of this series. But some comments:

* Such a flag would eliminate the need to change bpf_reg_state in this
  series, but I think this will be temporary. If we add support for nested
  locks we'll need to reintroduce some "lock identity" again. If we want
  to improve UX for non-owning reference invalidation in the case where
  a list_head and rb_root share the same lock, we'll need to introduce some
  "datastructure root identity" to allow invalidation of only the list's
  non-owning refs on list_pop.

* Sure, we could have both the NON_OWN_REF flag and additional {lock,root}
  identity structures. But there isn't infinite room for type flags and
  currently non-owning ref concept is contained to just two data structures.
  IMO in terms of generality this flag is closer to MEM_RINGBUF than
  PTR_MAYBE_NULL. If we're going to need {lock,root} identity structs
  and can use them to disambiguate between owning/non-owning refs quickly,
  why bother with an extra flag?

* Followup to earlier func_proto rant: I can't use NON_OWN_REF flag to tag
  a kfunc retval currently. So it'll really only be a verifier-internal flag.
  At that point, might as well add reg_is_non_owning_ref for canonical way
  of checking this.

> I would also squash patches 1 and 2. Since we've analyzed correctness of patch 2 well enough
> it doesn't make sense to go through the churn in patch 1 just to delete it in patch 2.
> 

Ack.

> wdyt?