Re: [PATCH v4 bpf-next 08/11] bpf: Special verifier handling for bpf_rbtree_{remove, first}

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Fri, 10 Feb 2023 12:01:12 -0800

On Fri, Feb 10, 2023 at 08:38:47PM +0100, Kumar Kartikeya Dwivedi wrote:
> On Fri, Feb 10, 2023 at 07:58:54PM CET, Alexei Starovoitov wrote:
> > On Fri, Feb 10, 2023 at 07:03:46PM +0100, Kumar Kartikeya Dwivedi wrote:
> > > On Fri, Feb 10, 2023 at 06:21:37PM CET, Alexei Starovoitov wrote:
> > > > On Fri, Feb 10, 2023 at 02:55:41PM +0100, Kumar Kartikeya Dwivedi wrote:
> > > > > On Thu, Feb 09, 2023 at 06:41:41PM CET, Dave Marchevsky wrote:
> > > > > > Newly-added bpf_rbtree_{remove,first} kfuncs have some special properties
> > > > > > that require handling in the verifier:
> > > > > >
> > > > > >   * both bpf_rbtree_remove and bpf_rbtree_first return the type containing
> > > > > >     the bpf_rb_node field, with the offset set to that field's offset,
> > > > > >     instead of a struct bpf_rb_node *
> > > > > >     * mark_reg_graph_node helper added in previous patch generalizes
> > > > > >       this logic, use it
> > > > > >
> > > > > >   * bpf_rbtree_remove's node input is a node that's been inserted
> > > > > >     in the tree - a non-owning reference.
> > > > > >
> > > > > >   * bpf_rbtree_remove must invalidate non-owning references in order to
> > > > > >     avoid aliasing issue. Use previously-added
> > > > > >     invalidate_non_owning_refs helper to mark this function as a
> > > > > >     non-owning ref invalidation point.
> > > > > >
> > > > > >   * Unlike other functions, which convert one of their input arg regs to
> > > > > >     non-owning reference, bpf_rbtree_first takes no arguments and just
> > > > > >     returns a non-owning reference (possibly null)
> > > > > >     * For now verifier logic for this is special-cased instead of
> > > > > >       adding new kfunc flag.
> > > > > >
> > > > > > This patch, along with the previous one, complete special verifier
> > > > > > handling for all rbtree API functions added in this series.
> > > > > >
> > > > >
> > > > > I think there are two issues with the current approach. The fundamental
> > > > > assumption with non-owning references is that it is part of the collection. So
> > > > > bpf_rbtree_{add,first}, bpf_list_push_{front,back} will create them, as no node
> > > > > is being removed from collection. Marking bpf_rbtree_remove (and in the future
> > > > > bpf_list_del) as invalidation points is also right, since once a node has been
> > > > > removed it is going to be unclear whether existing non-owning references have
> > > > > the same value, and thus the property of 'part of the collection' will be
> > > > > broken.
> > > >
> > > > correct, but the patch set does invalidate after bpf_rbtree_remove(),
> > > > so it's not an issue.
> > > >
> > > > > The first issue relates to usability. If I have non-owning references to nodes
> > > > > inserted into both a list and an rbtree, bpf_rbtree_remove should only
> > > > > invalidate the ones that are part of the particular rbtree. It should have no
> > > > > effect on others. Likewise for the bpf_list_del operation in the future.
> > > > > Therefore, we need to track the collection identity associated with each
> > > > > non-owning reference, then only invalidate non-owning references associated with
> > > > > the same collection.
> > > > >
> > > > > The case of bpf_spin_unlock is different, which should invalidate all non-owning
> > > > > references.
> > > > >
> > > > > The second issue is more serious. By not tracking the collection identity, we
> > > > > will currently allow a non-owning reference for an object inserted into a list
> > > > > to be passed to bpf_rbtree_remove, because the verifier cannot discern between
> > > > > 'inserted into rbtree' vs 'inserted into list'. For it, both are currently
> > > > > equivalent in the verifier state. An object is allowed to have both
> > > > > bpf_list_node and bpf_rb_node, but it can only be part of one collection at a
> > > > > time (because of no shared ownership).
> > > > >
> > > > > 	struct obj {
> > > > > 		bpf_list_node ln;
> > > > > 		bpf_rb_node rn;
> > > > > 	};
> > > > >
> > > > > 	bpf_list_push_front(head, &obj->ln); // node is non-own-ref
> > > > > 	bpf_rbtree_remove(&obj->rn); // should not work, but does
> > > >
> > > > Also correct, but inserting the same single owner node into rbtree and link list
> > > > is not supported. Only 'shared ownership' node can be inserted into
> > > > two collections.
> > >
> > > What is supported is having an object be part of a list and an rbtree one at a
> > > time, which is what I'm talking about here. Shared ownership has nothing to do
> > > with this.
> >
> > I wouldn't call it "supported".
> > btf_parse_struct_metas() doesn't error on structs with bpf_list_node and bpf_rb_node in them.
> > It allows two bpf_list_node-s as well.
> > list_node+rb_node is broken as you said.
> > I'm not sure two list_node-s is correct either. For the same reasons.
> 
> You mean it's incorrect before or after this series? If before, can you
> elaborate? The node isn't usable with kfuncs after push before this series.

I meant once we add non-own-refs and bpf_list_del. Something like:
struct obj {
	bpf_list_node ln;
	bpf_list_node ln2;
};

bpf_list_push_front(head, &obj->ln); // obj is non-own-ref
bpf_list_del(&obj->ln2); // should not work, but does

...details to fill in...
I'm arguing that bpf_list_node x 2 is the same danger zone as bpf_list_node + bpf_rb_node.
It's safe today only because link list is so limited.
I'm looking at full featured rb tree and link list with similar ops allowed in both.

> 
> > So it's an omission rather than support.
> >
> > > > The check to disallow bpf_list_node and bpf_rb_node in the same obj
> > > > can be a follow up patch to close this hole.
> > > >
> > >
> > > Fine, that would also 'fix' this problem, where a non-owning reference part of a
> > > list could be passed to bpf_rbtree_remove etc. If there can only be a list node
> > > or an rbtree node in an object, such a case cannot be constructed. But I think
> > > it's an awkward limitation.
> >
> > Regardless whether non-own refs exist or not. Two list_node-s are unusable
> > with single owner model.
> > struct obj {
> > 	bpf_list_node ln;
> > 	bpf_list_node ln2;
> > };
> >
> > bpf_list_push_front(head, &obj->ln); // doesn't matter what we do with obj here
> > bpf_list_push_front(head2, &obj->ln2); // cannot be made to work correctly
> > The only way to actually support this case is to have refcount in obj.
> >
> 
> Yes, I think we all agree it's not possible currently.
> 
> > It can "work" when obj is a part of only one list at a time, but one can reasonably
> > argue that a requirement to have two 'bpf_list_node'-s in a obj just two insert obj
> > in one list, pop it, and then insert into another list through a different list_node
> > is nothing but a waste of memory.
> 
> I didn't get this. Unless I'm mistaken, you should be able to move objects
> around from one list to another using the same single bpf_list_node. You don't
> need two. Just that the list_head needs correct __contains annotation.
> 
> > The user should be able to use the same 'bpf_list_node' to insert into one
> > list_head, remove it from there, and then insert into another list_head.
> >
> 
> I think this already works.

It could be. We don't have tests for it, so it's a guess.

> 
> > > > > So some notion of a collection identity needs to be constructed, the amount of
> > > > > data which needs to be remembered in each non-owning reference's register state
> > > > > depends on our requirements.
> > > > >
> > > > > The first sanity check is that bpf_rbtree_remove only removes something in an
> > > > > rbtree, so probably an enum member indicating whether collection is a list or
> > > > > rbtree. To ensure proper scoped invalidation, we will unfortunately need more
> > > > > than just the reg->id of the reg holding the graph root, since map values of
> > > > > different maps may have same id (0). Hence, we need id and ptr similar to the
> > > > > active lock case for proper matching. Even this won't be enough, as there can be
> > > > > multiple list or rbtree roots in a particular memory region, therefore the
> > > > > offset also needs to be part of the collection identity.
> > > > >
> > > > > So it seems it will amount to:
> > > > >
> > > > > 	struct bpf_collection_id {
> > > > > 		enum bpf_collection_type type;
> > > > > 		void *ptr;
> > > > > 		int id;
> > > > > 		int off;
> > > > > 	};
> > > > >
> > > > > There might be ways to optimize the memory footprint of this struct, but I'm
> > > > > just trying to state why we'll need to include all four, so we don't miss out on
> > > > > a corner case again.
> > > >
> > > > The trade-off doesn't feel right here. Tracking collection id complexity in
> > > > the verifier for single owner case is not worth it imo.
> > >
> > > It was more about correctness after this set is applied than being worth it. We
> > > could argue it's not worth it for the first issue which relates to usability.
> > > Dave has already mentioned that point. But the second one is simply incorrect to
> > > allow. As soon as you want an object which can be part of both a list and rbtree
> > > at different times (e.g., an item usually part of an rbtree but which is popped
> > > out and moved to a list for some processing), people will be able to trigger it.
> >
> > obj-in-a-list or obj-in-a-rbtree is technically possible, but I argue it's equally
> > weird corner case to support. See above why two list_node-s for head-at-a-time are awkward.
> >
> > > Simply disallowing that (as you said above) is also an option. But whenever you
> > > do allow it, you will need to add something like this. It has little to do with
> > > whether shared ownership is supported or not.
> >
> > I think we will allow two list_nodes or list_node+rb_node or ten rb_nodes
> > only when refcount is present as well and that would be 'shared ownership' case.
> 
> I get that you just want to punt it for now until someone has a use case, and
> move forward.
> 
> But saying that people will have to use a refcount in their object just to be
> able to have single unique ownership while allowing it to be part of multiple
> data structures at different times is weird. It's like forcing someone to use a
> shared_ptr in C++ (when they can use a unique_ptr) just because they want to
> std::move it from one place to another.
> 
> I thought the whole point was building composable building blocks and letting
> people choose their tradeoffs.

That's exactly what we're doing. rb-tree, link lists are building blocks.
If we can improve their UX we will certainly do it. I'm arguing that we don't
have a real user yet, so whether UX is good or bad is subjective.

For example in this thread and never before we looked critically at kptrs
and kptr_xchg. The first time people tried to use for real they said it sucks.
That's a feedback that we should listen to instead of our own believes.
We try to strike a balance between feature being useful while keeping the verifier
complexity to the minimum. Then deliver the feature into the hands of users and
iterate based on their feedback.