Re: [LSF/MM/BPF TOPIC] bpf qdisc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 1, 2024 at 6:08 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>
> Amery Hung <ameryhung@xxxxxxxxx> writes:
>
> > On Wed, Feb 28, 2024 at 6:36 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
> >>
> >> Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> writes:
> >>
> >> > On Mon, 26 Feb 2024 at 19:04, Amery Hung <ameryhung@xxxxxxxxx> wrote:
> >> >>
> >> >> Hi all,
> >> >>
> >> >> I would like to discuss bpf qdisc in the BPF track. As we now try to
> >> >> support bpf qdisc using struct_ops, we found some limitations of
> >> >> bpf/struct_ops. While some have been discussed briefly on the mailing
> >> >> list, we can discuss in more detail to make struct_ops a more
> >> >> generic/palatable approach to replace kernel functions.
> >> >>
> >> >> In addition, I would like to discuss supporting adding kernel objects
> >> >> to bpf_list/rbtree, which may have performance benefits in some
> >> >> applications and can improve the programming experience. The current
> >> >> bpf fq in the RFC has a 6% throughput loss compared to the native
> >> >> counterpart due to memory allocation in enqueue() to store skb kptr.
> >> >> With a POC I wrote that allows adding skb to bpf_list, the throughput
> >> >> becomes comparable. We can discuss the approach and other potential
> >> >> use cases.
> >> >>
> >> >
> >> > When discussing this with Toke (Cc'd) long ago for the XDP queueing
> >> > patch set, we discussed the same thing, in that the sk_buff already
> >> > has space for a list or rbnode due to it getting queued in other
> >> > layers (TCP OoO queue, qdiscs, etc.) so it would make sense to teach
> >> > the verifier that it is a valid bpf_list_node and bpf_rb_node and
> >> > allow inserting it as an element into a BPF list or rbtree. Back then
> >> > we didn't add that as the posting only used the PIFO map.
> >> >
> >> > I think not only sk_buff, you can do a similar thing with xdp_buff as
> >> > well.
> >>
> >> Yeah, I agree that allowing skbs to be inserted directly into a BPF
> >> rbtree would make a lot of sense if it can be done safely. I am less
> >> sure about xdp_frames, mostly for performance reasons, but if it does
> >> turn out to be useful whichever mechanism we add for skbs should be
> >> fairly straight forward to reuse.
> >>
> >> > The verifier side changes should be fairly minimal, just allowing the
> >> > use of a known kernel type as the contained object in a list or
> >> > rbtree, and the field pointing to this allowlisted list or rbnode.
> >>
> >> I think one additional concern here is how we ensure that an skb has
> >> been correctly removed from any rbtrees it sits in before it is being
> >> transmitted to another part of the stack?
> >
> > I think one solution is to disallow shared ownership of skb in
> > multiple lists or rbtrees. That is, users should not be able to
> > acquire the reference of an skb from the ctx more than once in
> > ".enqueue" or using bpf_refcount_acquire().
>
> Can the verifier enforce this, even across multiple enqueue/dequeue
> calls? Not sure if acquiring a refcount ensures that the rbtree entry
> has been cleared?
>
> Basically, I'm worried about a dequeue() op that does something like:
>
> skb = rbtree_head();
> // skb->rbnode is not cleared
> return skb; // stack will keep processing it
>
> I'm a little fuzzy on how the bpf rbtree stuff works, though, so maybe
> the verifier is already ensuring that a node cannot be read from a tree
> without being properly cleared from it?
>

I see what you are saying now, and thanks Kumar for the clarification!

I was thinking about how to prevent an skb from being added to lists
and rbtrees at the same time, since list and rbnode share the same
space. Hence the suggestion.

> -Toke
>





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux