Re: [PATCH bpf-next 01/13] bpf: Loosen alloc obj test in verifier's reg_btf_record

Dave Marchevsky <davemarchevsky@xxxxxxxx> · Wed, 7 Dec 2022 15:38:55 -0500

On 12/7/22 1:59 PM, Alexei Starovoitov wrote:
> On Wed, Dec 07, 2022 at 01:34:44PM -0500, Dave Marchevsky wrote:
>> On 12/7/22 11:41 AM, Kumar Kartikeya Dwivedi wrote:
>>> On Wed, Dec 07, 2022 at 04:39:48AM IST, Dave Marchevsky wrote:
>>>> btf->struct_meta_tab is populated by btf_parse_struct_metas in btf.c.
>>>> There, a BTF record is created for any type containing a spin_lock or
>>>> any next-gen datastructure node/head.
>>>>
>>>> Currently, for non-MAP_VALUE types, reg_btf_record will only search for
>>>> a record using struct_meta_tab if the reg->type exactly matches
>>>> (PTR_TO_BTF_ID | MEM_ALLOC). This exact match is too strict: an
>>>> "allocated obj" type - returned from bpf_obj_new - might pick up other
>>>> flags while working its way through the program.
>>>>
>>>
>>> Not following. Only PTR_TO_BTF_ID | MEM_ALLOC is the valid reg->type that can be
>>> passed to helpers. reg_btf_record is used in helpers to inspect the btf_record.
>>> Any other flag combination (the only one possible is PTR_UNTRUSTED right now)
>>> cannot be passed to helpers in the first place. The reason to set PTR_UNTRUSTED
>>> is to make then unpassable to helpers.
>>>
>>
>> I see what you mean. If reg_btf_record is only used on regs which are args,
>> then the exact match helps enforce PTR_UNTRUSTED not being an acceptable
>> type flag for an arg. Most uses of reg_btf_record seem to be on arg regs,
>> but then we have its use in reg_may_point_to_spin_lock, which is itself
>> used in mark_ptr_or_null_reg and on BPF_REG_0 in check_kfunc_call. So I'm not
>> sure that it's only used on arg regs currently.
>>
>> Regardless, if the intended use is on arg regs only, it should be renamed to
>> arg_reg_btf_record or similar to make that clear, as current name sounds like
>> it should be applicable to any reg, and thus not enforce constraints particular
>> to arg regs.
>>
>> But I think it's better to leave it general and enforce those constraints
>> elsewhere. For kfuncs this is already happening in check_kfunc_args, where the
>> big switch statements for KF_ARG_* are doing exact type matching.
>>
>>>> Loosen the check to be exact for base_type and just use MEM_ALLOC mask
>>>> for type_flag.
>>>>
>>>> This patch is marked Fixes as the original intent of reg_btf_record was
>>>> unlikely to have been to fail finding btf_record for valid alloc obj
>>>> types with additional flags, some of which (e.g. PTR_UNTRUSTED)
>>>> are valid register type states for alloc obj independent of this series.
>>>
>>> That was the actual intent, same as how check_ptr_to_btf_access uses the exact
>>> reg->type to allow the BPF_WRITE case.
>>>
>>> I think this series is the one introducing this case, passing bpf_rbtree_first's
>>> result to bpf_rbtree_remove, which I think is not possible to make safe in the
>>> first place. We decided to do bpf_list_pop_front instead of bpf_list_entry ->
>>> bpf_list_del due to this exact issue. More in [0].
>>>
>>>  [0]: https://lore.kernel.org/bpf/CAADnVQKifhUk_HE+8qQ=AOhAssH6w9LZ082Oo53rwaS+tAGtOw@xxxxxxxxxxxxxx
>>>
>>
>> Thanks for the link, I better understand what Alexei meant in his comment on
>> patch 9 of this series. For the helpers added in this series, we can make
>> bpf_rbtree_first -> bpf_rbtree_remove safe by invalidating all release_on_unlock
>> refs after the rbtree_remove in same manner as they're invalidated after
>> spin_unlock currently.
>>
>> Logic for why this is safe:
>>
>>   * If we have two non-owning refs to nodes in a tree, e.g. from
>>     bpf_rbtree_add(node) and calling bpf_rbtree_first() immediately after,
>>     we have no way of knowing if they're aliases of same node.
>>
>>   * If bpf_rbtree_remove takes arbitrary non-owning ref to node in the tree,
>>     it might be removing a node that's already been removed, e.g.:
>>
>>         n = bpf_obj_new(...);
>>         bpf_spin_lock(&lock);
>>
>>         bpf_rbtree_add(&tree, &n->node);
>>         // n is now non-owning ref to node which was added
>>         res = bpf_rbtree_first();
>>         if (!m) {}
>>         m = container_of(res, struct node_data, node);
>>         // m is now non-owning ref to the same node
>>         bpf_rbtree_remove(&tree, &n->node);
>>         bpf_rbtree_remove(&tree, &m->node); // BAD
> 
> Let me clarify my previous email:
> 
> Above doesn't have to be 'BAD'.
> Instead of
> if (WARN_ON_ONCE(RB_EMPTY_NODE(n)))
> 
> we can drop WARN and simply return.
> If node is not part of the tree -> nop.
> 
> Same for bpf_rbtree_add.
> If it's already added -> nop.
> 

These runtime checks can certainly be done, but if we can guarantee via
verifier type system that a particular ptr-to-node is guaranteed to be in /
not be in a tree, that's better, no?

Feels like a similar train of thought to "fail verification when correct rbtree
lock isn't held" vs "just check if lock is held in every rbtree API kfunc".

> Then we can have bpf_rbtree_first() returning PTR_TRUSTED with acquire semantics.
> We do all these checks under the same rbtree root lock, so it's safe.
> 

I'll comment on PTR_TRUSTED in our discussion on patch 10.

>>         bpf_spin_unlock(&lock);
>>
>>   * bpf_rbtree_remove is the only "pop()" currently. Non-owning refs are at risk
>>     of pointing to something that was already removed _only_ after a
>>     rbtree_remove, so if we invalidate them all after rbtree_remove they can't
>>     be inputs to subsequent remove()s
> 
> With above proposed run-time checks both bpf_rbtree_remove and bpf_rbtree_add
> can have release semantics.
> No need for special release_on_unlock hacks.
> 

If we want to be able to interact w/ nodes after they've been added to the
rbtree, but before critical section ends, we need to support non-owning refs,
which are currently implemented using special release_on_unlock logic.

If we go with the runtime check suggestion from above, we'd need to implement
'conditional release' similarly to earlier "rbtree map" attempt:
https://lore.kernel.org/bpf/20220830172759.4069786-14-davemarchevsky@xxxxxx/ .

If rbtree_add has release semantics for its node arg, but the node is already
in some tree and runtime check fails, the reference should not be released as
rbtree_add() was a nop.

Similarly, if rbtree_remove has release semantics for its node arg and acquire
semantics for its return value, runtime check failing should result in the
node arg not being released. Acquire semantics for the retval are already
conditional - if retval == NULL, mark_ptr_or_null regs will release the
acquired ref before it can be used. So no issue with failing rbtree_remove
messing up acquire.

For this reason rbtree_remove and rbtree_first are tagged
KF_ACQUIRE | KF_RET_NULL. "special release_on_unlock hacks" can likely be
refactored into a similar flag, KF_RELEASE_NON_OWN or similar.

>> This does conflate current "release non-owning refs because it's not safe to
>> read from them" reasoning with new "release non-owning refs so they can't be
>> passed to remove()". Ideally we could add some new tag to these refs that
>> prevents them from being passed to remove()-type fns, but does allow them to
>> be read, e.g.:
>>
>>   n = bpf_obj_new(...);
> 
> 'n' is acquired.
> 
>>   bpf_spin_lock(&lock);
>>
>>   bpf_rbtree_add(&tree, &n->node);
>>   // n is now non-owning ref to node which was added
> 
> since bpf_rbtree_add does release on 'n'...
> 
>>   res = bpf_rbtree_first();
>>   if (!m) {}
>>   m = container_of(res, struct node_data, node);
>>   // m is now non-owning ref to the same node
> 
> ... below is not allowed by the verifier.
>>   n = bpf_rbtree_remove(&tree, &n->node);
> 
> I'm not sure what's an idea to return 'n' from remove...
> Maybe it should be simple bool ?
> 

I agree that returning node from rbtree_remove is not strictly necessary, since
rbtree_remove can be thought of turning its non-owning ref argument into an
owning ref, instead of taking non-owning ref and returning owning ref. But such
an operation isn't really an 'acquire' by current verifier logic, since only
retvals can be 'acquired'. So we'd need to add some logic to enable acquire
semantics for args. Furthermore it's not really 'acquiring' a new ref, rather
changing properties of node arg ref.

However, if rbtree_remove can fail, such a "turn non-owning into owning"
operation will need to be able to fail as well, and the program will need to
be able to check for failure. Returning 'acquire' result in retval makes
this simple - just check for NULL. For your "return bool" proposal, we'd have
to add verifier logic which turns the 'acquired' owning ref back into non-owning
based on check of the bool, which will add some verifier complexity.

IIRC when doing experimentation with "rbtree map" implementation, I did
something like this and decided that the additional complexity wasn't worth
it when retval can just be used. 

>>   // n is now owning ref again, m is non-owning ref to same node
>>   x = m->key; // this should be safe since we're still in CS
> 
> below works because 'm' cames from bpf_rbtree_first that acquired 'res'.
> 
>>   bpf_rbtree_remove(&tree, &m->node); // But this should be prevented
>>
>>   bpf_spin_unlock(&lock);
>>