Re: [PATCH bpf-next 01/13] bpf: Loosen alloc obj test in verifier's reg_btf_record

Dave Marchevsky <davemarchevsky@xxxxxxxx> · Wed, 7 Dec 2022 18:42:27 -0500

On 12/7/22 5:46 PM, Alexei Starovoitov wrote:
> On Wed, Dec 07, 2022 at 03:38:55PM -0500, Dave Marchevsky wrote:
>> On 12/7/22 1:59 PM, Alexei Starovoitov wrote:
>>> On Wed, Dec 07, 2022 at 01:34:44PM -0500, Dave Marchevsky wrote:
>>>> On 12/7/22 11:41 AM, Kumar Kartikeya Dwivedi wrote:
>>>>> On Wed, Dec 07, 2022 at 04:39:48AM IST, Dave Marchevsky wrote:
>>>>>> btf->struct_meta_tab is populated by btf_parse_struct_metas in btf.c.
>>>>>> There, a BTF record is created for any type containing a spin_lock or
>>>>>> any next-gen datastructure node/head.
>>>>>>
>>>>>> Currently, for non-MAP_VALUE types, reg_btf_record will only search for
>>>>>> a record using struct_meta_tab if the reg->type exactly matches
>>>>>> (PTR_TO_BTF_ID | MEM_ALLOC). This exact match is too strict: an
>>>>>> "allocated obj" type - returned from bpf_obj_new - might pick up other
>>>>>> flags while working its way through the program.
>>>>>>
>>>>>
>>>>> Not following. Only PTR_TO_BTF_ID | MEM_ALLOC is the valid reg->type that can be
>>>>> passed to helpers. reg_btf_record is used in helpers to inspect the btf_record.
>>>>> Any other flag combination (the only one possible is PTR_UNTRUSTED right now)
>>>>> cannot be passed to helpers in the first place. The reason to set PTR_UNTRUSTED
>>>>> is to make then unpassable to helpers.
>>>>>
>>>>
>>>> I see what you mean. If reg_btf_record is only used on regs which are args,
>>>> then the exact match helps enforce PTR_UNTRUSTED not being an acceptable
>>>> type flag for an arg. Most uses of reg_btf_record seem to be on arg regs,
>>>> but then we have its use in reg_may_point_to_spin_lock, which is itself
>>>> used in mark_ptr_or_null_reg and on BPF_REG_0 in check_kfunc_call. So I'm not
>>>> sure that it's only used on arg regs currently.
>>>>
>>>> Regardless, if the intended use is on arg regs only, it should be renamed to
>>>> arg_reg_btf_record or similar to make that clear, as current name sounds like
>>>> it should be applicable to any reg, and thus not enforce constraints particular
>>>> to arg regs.
>>>>
>>>> But I think it's better to leave it general and enforce those constraints
>>>> elsewhere. For kfuncs this is already happening in check_kfunc_args, where the
>>>> big switch statements for KF_ARG_* are doing exact type matching.
>>>>
>>>>>> Loosen the check to be exact for base_type and just use MEM_ALLOC mask
>>>>>> for type_flag.
>>>>>>
>>>>>> This patch is marked Fixes as the original intent of reg_btf_record was
>>>>>> unlikely to have been to fail finding btf_record for valid alloc obj
>>>>>> types with additional flags, some of which (e.g. PTR_UNTRUSTED)
>>>>>> are valid register type states for alloc obj independent of this series.
>>>>>
>>>>> That was the actual intent, same as how check_ptr_to_btf_access uses the exact
>>>>> reg->type to allow the BPF_WRITE case.
>>>>>
>>>>> I think this series is the one introducing this case, passing bpf_rbtree_first's
>>>>> result to bpf_rbtree_remove, which I think is not possible to make safe in the
>>>>> first place. We decided to do bpf_list_pop_front instead of bpf_list_entry ->
>>>>> bpf_list_del due to this exact issue. More in [0].
>>>>>
>>>>>  [0]: https://lore.kernel.org/bpf/CAADnVQKifhUk_HE+8qQ=AOhAssH6w9LZ082Oo53rwaS+tAGtOw@xxxxxxxxxxxxxx
>>>>>
>>>>
>>>> Thanks for the link, I better understand what Alexei meant in his comment on
>>>> patch 9 of this series. For the helpers added in this series, we can make
>>>> bpf_rbtree_first -> bpf_rbtree_remove safe by invalidating all release_on_unlock
>>>> refs after the rbtree_remove in same manner as they're invalidated after
>>>> spin_unlock currently.
>>>>
>>>> Logic for why this is safe:
>>>>
>>>>   * If we have two non-owning refs to nodes in a tree, e.g. from
>>>>     bpf_rbtree_add(node) and calling bpf_rbtree_first() immediately after,
>>>>     we have no way of knowing if they're aliases of same node.
>>>>
>>>>   * If bpf_rbtree_remove takes arbitrary non-owning ref to node in the tree,
>>>>     it might be removing a node that's already been removed, e.g.:
>>>>
>>>>         n = bpf_obj_new(...);
>>>>         bpf_spin_lock(&lock);
>>>>
>>>>         bpf_rbtree_add(&tree, &n->node);
>>>>         // n is now non-owning ref to node which was added
>>>>         res = bpf_rbtree_first();
>>>>         if (!m) {}
>>>>         m = container_of(res, struct node_data, node);
>>>>         // m is now non-owning ref to the same node
>>>>         bpf_rbtree_remove(&tree, &n->node);
>>>>         bpf_rbtree_remove(&tree, &m->node); // BAD
>>>
>>> Let me clarify my previous email:
>>>
>>> Above doesn't have to be 'BAD'.
>>> Instead of
>>> if (WARN_ON_ONCE(RB_EMPTY_NODE(n)))
>>>
>>> we can drop WARN and simply return.
>>> If node is not part of the tree -> nop.
>>>
>>> Same for bpf_rbtree_add.
>>> If it's already added -> nop.
>>>
>>
>> These runtime checks can certainly be done, but if we can guarantee via
>> verifier type system that a particular ptr-to-node is guaranteed to be in /
>> not be in a tree, that's better, no?
>>
>> Feels like a similar train of thought to "fail verification when correct rbtree
>> lock isn't held" vs "just check if lock is held in every rbtree API kfunc".
>>
>>> Then we can have bpf_rbtree_first() returning PTR_TRUSTED with acquire semantics.
>>> We do all these checks under the same rbtree root lock, so it's safe.
>>>
>>
>> I'll comment on PTR_TRUSTED in our discussion on patch 10.
>>
>>>>         bpf_spin_unlock(&lock);
>>>>
>>>>   * bpf_rbtree_remove is the only "pop()" currently. Non-owning refs are at risk
>>>>     of pointing to something that was already removed _only_ after a
>>>>     rbtree_remove, so if we invalidate them all after rbtree_remove they can't
>>>>     be inputs to subsequent remove()s
>>>
>>> With above proposed run-time checks both bpf_rbtree_remove and bpf_rbtree_add
>>> can have release semantics.
>>> No need for special release_on_unlock hacks.
>>>
>>
>> If we want to be able to interact w/ nodes after they've been added to the
>> rbtree, but before critical section ends, we need to support non-owning refs,
>> which are currently implemented using special release_on_unlock logic.
>>
>> If we go with the runtime check suggestion from above, we'd need to implement
>> 'conditional release' similarly to earlier "rbtree map" attempt:
>> https://lore.kernel.org/bpf/20220830172759.4069786-14-davemarchevsky@xxxxxx/ .
>>
>> If rbtree_add has release semantics for its node arg, but the node is already
>> in some tree and runtime check fails, the reference should not be released as
>> rbtree_add() was a nop.
> 
> Got it.
> The conditional release is tricky. We should probably avoid it for now.
> 
> I think we can either go with Kumar's proposal and do
> bpf_rbtree_pop_front() instead of bpf_rbtree_first()
> that avoids all these issues...
> 
> but considering that we'll have inline iterators soon and should be able to do:
> 
> struct bpf_rbtree_iter it;
> struct bpf_rb_node * node;
> 
> bpf_rbtree_iter_init(&it, rb_root); // locks the rbtree
> while ((node = bpf_rbtree_iter_next(&it)) {
>   if (node->field == condition) {
>     struct bpf_rb_node *n;
> 
>     n = bpf_rbtree_remove(rb_root, node);
>     bpf_spin_lock(another_rb_root);
>     bpf_rbtree_add(another_rb_root, n);
>     bpf_spin_unlock(another_rb_root);
>     break;
>   }
> }
> bpf_rbtree_iter_destroy(&it);
> 
> We can treat the 'node' returned from bpf_rbtree_iter_next() the same way
> as return from bpf_rbtree_first() ->  PTR_TRUSTED | MAYBE_NULL,
> but not acquired (ref_obj_id == 0).
> 
> bpf_rbtree_add -> KF_RELEASE
> so we cannot pass not acquired pointers into it.
> 
> We should probably remove release_on_unlock logic as Kumar suggesting and
> make bpf_list_push_front/back to be KF_RELEASE.
> 
> Then
> bpf_list_pop_front/back stay KF_ACQUIRE | KF_RET_NULL
> and
> bpf_rbtree_remove is also KF_ACQUIRE | KF_RET_NULL.
> 
> The difference is bpf_list_pop has only 'head'
> while bpf_rbtree_remove has 'root' and 'node' where 'node' has to be PTR_TRUSTED
> (but not acquired).
> 
> bpf_rbtree_add will always succeed.
> bpf_rbtree_remove will conditionally fail if 'node' is not linked.
> 
> Similarly we can extend link list with
> n = bpf_list_remove(node)
> which will have KF_ACQUIRE | KF_RET_NULL semantics.
> 
> Then everything is nicely uniform.
> We'll be able to iterate rbtree and iterate link lists.
> 
> There are downsides, of course.
> Like the following from your test case:
> +       bpf_spin_lock(&glock);
> +       bpf_rbtree_add(&groot, &n->node, less);
> +       bpf_rbtree_add(&groot, &m->node, less);
> +       res = bpf_rbtree_remove(&groot, &n->node);
> +       bpf_spin_unlock(&glock);
> will not work.
> Since bpf_rbtree_add() releases 'n' and it becomes UNTRUSTED.
> (assuming release_on_unlock is removed).
> 
> I think it's fine for now. I have to agree with Kumar that it's hard to come up
> with realistic use case where 'n' should be accessed after it was added to link
> list or rbtree. Above test case doesn't look real.
> 
> This part of your test case:
> +       bpf_spin_lock(&glock);
> +       bpf_rbtree_add(&groot, &n->node, less);
> +       bpf_rbtree_add(&groot, &m->node, less);
> +       bpf_rbtree_add(&groot, &o->node, less);
> +
> +       res = bpf_rbtree_first(&groot);
> +       if (!res) {
> +               bpf_spin_unlock(&glock);
> +               return 2;
> +       }
> +
> +       o = container_of(res, struct node_data, node);
> +       res = bpf_rbtree_remove(&groot, &o->node);
> +       bpf_spin_unlock(&glock);
> 
> will work, because bpf_rbtree_first returns PTR_TRUSTED | MAYBE_NULL.
> 
>> Similarly, if rbtree_remove has release semantics for its node arg and acquire
>> semantics for its return value, runtime check failing should result in the
>> node arg not being released. Acquire semantics for the retval are already
>> conditional - if retval == NULL, mark_ptr_or_null regs will release the
>> acquired ref before it can be used. So no issue with failing rbtree_remove
>> messing up acquire.
>>
>> For this reason rbtree_remove and rbtree_first are tagged
>> KF_ACQUIRE | KF_RET_NULL. "special release_on_unlock hacks" can likely be
>> refactored into a similar flag, KF_RELEASE_NON_OWN or similar.
> 
> I guess what I'm propsing above is sort-of KF_RELEASE_NON_OWN idea,
> but from a different angle.
> I'd like to avoid introducing new flags.
> I think PTR_TRUSTED is enough.
> 
>>> I'm not sure what's an idea to return 'n' from remove...
>>> Maybe it should be simple bool ?
>>>
>>
>> I agree that returning node from rbtree_remove is not strictly necessary, since
>> rbtree_remove can be thought of turning its non-owning ref argument into an
>> owning ref, instead of taking non-owning ref and returning owning ref. But such
>> an operation isn't really an 'acquire' by current verifier logic, since only
>> retvals can be 'acquired'. So we'd need to add some logic to enable acquire
>> semantics for args. Furthermore it's not really 'acquiring' a new ref, rather
>> changing properties of node arg ref.
>>
>> However, if rbtree_remove can fail, such a "turn non-owning into owning"
>> operation will need to be able to fail as well, and the program will need to
>> be able to check for failure. Returning 'acquire' result in retval makes
>> this simple - just check for NULL. For your "return bool" proposal, we'd have
>> to add verifier logic which turns the 'acquired' owning ref back into non-owning
>> based on check of the bool, which will add some verifier complexity.
>>
>> IIRC when doing experimentation with "rbtree map" implementation, I did
>> something like this and decided that the additional complexity wasn't worth
>> it when retval can just be used. 
> 
> Agree. Forget 'bool' idea.

We will merge this convo w/ similar one in the cover letter's thread, and
continue w/ replies there.