Amery Hung <ameryhung@xxxxxxxxx> writes: > On Fri, Mar 1, 2024 at 6:08 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: >> >> Amery Hung <ameryhung@xxxxxxxxx> writes: >> >> > On Wed, Feb 28, 2024 at 6:36 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: >> >> >> >> Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> writes: >> >> >> >> > On Mon, 26 Feb 2024 at 19:04, Amery Hung <ameryhung@xxxxxxxxx> wrote: >> >> >> >> >> >> Hi all, >> >> >> >> >> >> I would like to discuss bpf qdisc in the BPF track. As we now try to >> >> >> support bpf qdisc using struct_ops, we found some limitations of >> >> >> bpf/struct_ops. While some have been discussed briefly on the mailing >> >> >> list, we can discuss in more detail to make struct_ops a more >> >> >> generic/palatable approach to replace kernel functions. >> >> >> >> >> >> In addition, I would like to discuss supporting adding kernel objects >> >> >> to bpf_list/rbtree, which may have performance benefits in some >> >> >> applications and can improve the programming experience. The current >> >> >> bpf fq in the RFC has a 6% throughput loss compared to the native >> >> >> counterpart due to memory allocation in enqueue() to store skb kptr. >> >> >> With a POC I wrote that allows adding skb to bpf_list, the throughput >> >> >> becomes comparable. We can discuss the approach and other potential >> >> >> use cases. >> >> >> >> >> > >> >> > When discussing this with Toke (Cc'd) long ago for the XDP queueing >> >> > patch set, we discussed the same thing, in that the sk_buff already >> >> > has space for a list or rbnode due to it getting queued in other >> >> > layers (TCP OoO queue, qdiscs, etc.) so it would make sense to teach >> >> > the verifier that it is a valid bpf_list_node and bpf_rb_node and >> >> > allow inserting it as an element into a BPF list or rbtree. Back then >> >> > we didn't add that as the posting only used the PIFO map. >> >> > >> >> > I think not only sk_buff, you can do a similar thing with xdp_buff as >> >> > well. >> >> >> >> Yeah, I agree that allowing skbs to be inserted directly into a BPF >> >> rbtree would make a lot of sense if it can be done safely. I am less >> >> sure about xdp_frames, mostly for performance reasons, but if it does >> >> turn out to be useful whichever mechanism we add for skbs should be >> >> fairly straight forward to reuse. >> >> >> >> > The verifier side changes should be fairly minimal, just allowing the >> >> > use of a known kernel type as the contained object in a list or >> >> > rbtree, and the field pointing to this allowlisted list or rbnode. >> >> >> >> I think one additional concern here is how we ensure that an skb has >> >> been correctly removed from any rbtrees it sits in before it is being >> >> transmitted to another part of the stack? >> > >> > I think one solution is to disallow shared ownership of skb in >> > multiple lists or rbtrees. That is, users should not be able to >> > acquire the reference of an skb from the ctx more than once in >> > ".enqueue" or using bpf_refcount_acquire(). >> >> Can the verifier enforce this, even across multiple enqueue/dequeue >> calls? Not sure if acquiring a refcount ensures that the rbtree entry >> has been cleared? >> >> Basically, I'm worried about a dequeue() op that does something like: >> >> skb = rbtree_head(); >> // skb->rbnode is not cleared >> return skb; // stack will keep processing it >> >> I'm a little fuzzy on how the bpf rbtree stuff works, though, so maybe >> the verifier is already ensuring that a node cannot be read from a tree >> without being properly cleared from it? >> > > I see what you are saying now, and thanks Kumar for the clarification! > > I was thinking about how to prevent an skb from being added to lists > and rbtrees at the same time, since list and rbnode share the same > space. Hence the suggestion. Ah, yes, good point, that is also a concern, certainly! -Toke