Re: [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first}

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Fri, 10 Feb 2023 20:38:47 +0100

On Fri, Feb 10, 2023 at 07:58:54PM CET, Alexei Starovoitov wrote:
> On Fri, Feb 10, 2023 at 07:03:46PM +0100, Kumar Kartikeya Dwivedi wrote:
> > On Fri, Feb 10, 2023 at 06:21:37PM CET, Alexei Starovoitov wrote:
> > > On Fri, Feb 10, 2023 at 02:55:41PM +0100, Kumar Kartikeya Dwivedi wrote:
> > > > On Thu, Feb 09, 2023 at 06:41:41PM CET, Dave Marchevsky wrote:
> > > > > Newly-added bpf_rbtree_{remove,first} kfuncs have some special properties
> > > > > that require handling in the verifier:
> > > > >
> > > > >   * both bpf_rbtree_remove and bpf_rbtree_first return the type containing
> > > > >     the bpf_rb_node field, with the offset set to that field's offset,
> > > > >     instead of a struct bpf_rb_node *
> > > > >     * mark_reg_graph_node helper added in previous patch generalizes
> > > > >       this logic, use it
> > > > >
> > > > >   * bpf_rbtree_remove's node input is a node that's been inserted
> > > > >     in the tree - a non-owning reference.
> > > > >
> > > > >   * bpf_rbtree_remove must invalidate non-owning references in order to
> > > > >     avoid aliasing issue. Use previously-added
> > > > >     invalidate_non_owning_refs helper to mark this function as a
> > > > >     non-owning ref invalidation point.
> > > > >
> > > > >   * Unlike other functions, which convert one of their input arg regs to
> > > > >     non-owning reference, bpf_rbtree_first takes no arguments and just
> > > > >     returns a non-owning reference (possibly null)
> > > > >     * For now verifier logic for this is special-cased instead of
> > > > >       adding new kfunc flag.
> > > > >
> > > > > This patch, along with the previous one, complete special verifier
> > > > > handling for all rbtree API functions added in this series.
> > > > >
> > > >
> > > > I think there are two issues with the current approach. The fundamental
> > > > assumption with non-owning references is that it is part of the collection. So
> > > > bpf_rbtree_{add,first}, bpf_list_push_{front,back} will create them, as no node
> > > > is being removed from collection. Marking bpf_rbtree_remove (and in the future
> > > > bpf_list_del) as invalidation points is also right, since once a node has been
> > > > removed it is going to be unclear whether existing non-owning references have
> > > > the same value, and thus the property of 'part of the collection' will be
> > > > broken.
> > >
> > > correct, but the patch set does invalidate after bpf_rbtree_remove(),
> > > so it's not an issue.
> > >
> > > > The first issue relates to usability. If I have non-owning references to nodes
> > > > inserted into both a list and an rbtree, bpf_rbtree_remove should only
> > > > invalidate the ones that are part of the particular rbtree. It should have no
> > > > effect on others. Likewise for the bpf_list_del operation in the future.
> > > > Therefore, we need to track the collection identity associated with each
> > > > non-owning reference, then only invalidate non-owning references associated with
> > > > the same collection.
> > > >
> > > > The case of bpf_spin_unlock is different, which should invalidate all non-owning
> > > > references.
> > > >
> > > > The second issue is more serious. By not tracking the collection identity, we
> > > > will currently allow a non-owning reference for an object inserted into a list
> > > > to be passed to bpf_rbtree_remove, because the verifier cannot discern between
> > > > 'inserted into rbtree' vs 'inserted into list'. For it, both are currently
> > > > equivalent in the verifier state. An object is allowed to have both
> > > > bpf_list_node and bpf_rb_node, but it can only be part of one collection at a
> > > > time (because of no shared ownership).
> > > >
> > > > 	struct obj {
> > > > 		bpf_list_node ln;
> > > > 		bpf_rb_node rn;
> > > > 	};
> > > >
> > > > 	bpf_list_push_front(head, &obj->ln); // node is non-own-ref
> > > > 	bpf_rbtree_remove(&obj->rn); // should not work, but does
> > >
> > > Also correct, but inserting the same single owner node into rbtree and link list
> > > is not supported. Only 'shared ownership' node can be inserted into
> > > two collections.
> >
> > What is supported is having an object be part of a list and an rbtree one at a
> > time, which is what I'm talking about here. Shared ownership has nothing to do
> > with this.
>
> I wouldn't call it "supported".
> btf_parse_struct_metas() doesn't error on structs with bpf_list_node and bpf_rb_node in them.
> It allows two bpf_list_node-s as well.
> list_node+rb_node is broken as you said.
> I'm not sure two list_node-s is correct either. For the same reasons.

You mean it's incorrect before or after this series? If before, can you
elaborate? The node isn't usable with kfuncs after push before this series.

> So it's an omission rather than support.
>
> > > The check to disallow bpf_list_node and bpf_rb_node in the same obj
> > > can be a follow up patch to close this hole.
> > >
> >
> > Fine, that would also 'fix' this problem, where a non-owning reference part of a
> > list could be passed to bpf_rbtree_remove etc. If there can only be a list node
> > or an rbtree node in an object, such a case cannot be constructed. But I think
> > it's an awkward limitation.
>
> Regardless whether non-own refs exist or not. Two list_node-s are unusable
> with single owner model.
> struct obj {
> 	bpf_list_node ln;
> 	bpf_list_node ln2;
> };
>
> bpf_list_push_front(head, &obj->ln); // doesn't matter what we do with obj here
> bpf_list_push_front(head2, &obj->ln2); // cannot be made to work correctly
> The only way to actually support this case is to have refcount in obj.
>

Yes, I think we all agree it's not possible currently.

> It can "work" when obj is a part of only one list at a time, but one can reasonably
> argue that a requirement to have two 'bpf_list_node'-s in a obj just two insert obj
> in one list, pop it, and then insert into another list through a different list_node
> is nothing but a waste of memory.

I didn't get this. Unless I'm mistaken, you should be able to move objects
around from one list to another using the same single bpf_list_node. You don't
need two. Just that the list_head needs correct __contains annotation.

> The user should be able to use the same 'bpf_list_node' to insert into one
> list_head, remove it from there, and then insert into another list_head.
>

I think this already works.

> > > > So some notion of a collection identity needs to be constructed, the amount of
> > > > data which needs to be remembered in each non-owning reference's register state
> > > > depends on our requirements.
> > > >
> > > > The first sanity check is that bpf_rbtree_remove only removes something in an
> > > > rbtree, so probably an enum member indicating whether collection is a list or
> > > > rbtree. To ensure proper scoped invalidation, we will unfortunately need more
> > > > than just the reg->id of the reg holding the graph root, since map values of
> > > > different maps may have same id (0). Hence, we need id and ptr similar to the
> > > > active lock case for proper matching. Even this won't be enough, as there can be
> > > > multiple list or rbtree roots in a particular memory region, therefore the
> > > > offset also needs to be part of the collection identity.
> > > >
> > > > So it seems it will amount to:
> > > >
> > > > 	struct bpf_collection_id {
> > > > 		enum bpf_collection_type type;
> > > > 		void *ptr;
> > > > 		int id;
> > > > 		int off;
> > > > 	};
> > > >
> > > > There might be ways to optimize the memory footprint of this struct, but I'm
> > > > just trying to state why we'll need to include all four, so we don't miss out on
> > > > a corner case again.
> > >
> > > The trade-off doesn't feel right here. Tracking collection id complexity in
> > > the verifier for single owner case is not worth it imo.
> >
> > It was more about correctness after this set is applied than being worth it. We
> > could argue it's not worth it for the first issue which relates to usability.
> > Dave has already mentioned that point. But the second one is simply incorrect to
> > allow. As soon as you want an object which can be part of both a list and rbtree
> > at different times (e.g., an item usually part of an rbtree but which is popped
> > out and moved to a list for some processing), people will be able to trigger it.
>
> obj-in-a-list or obj-in-a-rbtree is technically possible, but I argue it's equally
> weird corner case to support. See above why two list_node-s for head-at-a-time are awkward.
>
> > Simply disallowing that (as you said above) is also an option. But whenever you
> > do allow it, you will need to add something like this. It has little to do with
> > whether shared ownership is supported or not.
>
> I think we will allow two list_nodes or list_node+rb_node or ten rb_nodes
> only when refcount is present as well and that would be 'shared ownership' case.

I get that you just want to punt it for now until someone has a use case, and
move forward.

But saying that people will have to use a refcount in their object just to be
able to have single unique ownership while allowing it to be part of multiple
data structures at different times is weird. It's like forcing someone to use a
shared_ptr in C++ (when they can use a unique_ptr) just because they want to
std::move it from one place to another.

I thought the whole point was building composable building blocks and letting
people choose their tradeoffs.