Re: [RFC Patch net-next] net_sched: introduce eBPF based Qdisc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 20, 2021 at 06:02:40PM -0700, Cong Wang wrote:
> From: Cong Wang <cong.wang@xxxxxxxxxxxxx>
> 
> This *incomplete* patch introduces a programmable Qdisc with
> eBPF.  The goal is to make Qdisc as programmable as possible,
> that is, to replace as many existing Qdisc's as we can. ;)
> 
> The design was discussed during last LPC:
> https://linuxplumbersconf.org/event/7/contributions/679/attachments/520/1188/sch_bpf.pdf 
> 
> Here is a summary of design decisions I made:
> 
> 1. Avoid eBPF struct_ops, as it would be really hard to program
>    a Qdisc with this approach.
Please explain more on this.  What is currently missing
to make qdisc in struct_ops possible?

> 2. Avoid exposing skb's to user-space, which means we can't introduce
>    a map to store skb's. Instead, store them in kernel without exposure
>    to user-space.
> 
> So I choose to use priority queues to store skb's inside a
> flow and to store flows inside a Qdisc, and let eBPF programs
> decide the *relative* position of the skb within the flow and the
> *relative* order of the flows too, upon each enqueue and dequeue.
> Each flow is also exposed to user as a TC class, like many other
> classful Qdisc's.
> 
> Although the biggest limitation is obviously that users can
> not traverse the packets or flows inside the Qdisc, I think
> at least they could store those global information of interest
> inside their own map and map can be shared between enqueue and
> dequeue. For example, users could use skb pointer as key and
> rank as a value to find out the absolute order.
> 
> One of the challeges is how to interact with existing TC infra,
> for instance, if users install TC filters on this Qdisc, should
> we respect this by ignoring or rejecting eBPF enqueue program
> attached or vice versa? Should we allow users to replace each
> priority queue of a class with a regular Qdisc?
> 
> Any high-level feedbacks are welcome. Please do not review any
> coding details until RFC tag is removed.
> 
> Cc: Jamal Hadi Salim <jhs@xxxxxxxxxxxx>
> Cc: Jiri Pirko <jiri@xxxxxxxxxxx>
> Signed-off-by: Cong Wang <cong.wang@xxxxxxxxxxxxx>



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux