On Fri, Aug 20, 2021 at 06:02:40PM -0700, Cong Wang wrote: > From: Cong Wang <cong.wang@xxxxxxxxxxxxx> > > This *incomplete* patch introduces a programmable Qdisc with > eBPF. The goal is to make Qdisc as programmable as possible, > that is, to replace as many existing Qdisc's as we can. ;) > > The design was discussed during last LPC: > https://linuxplumbersconf.org/event/7/contributions/679/attachments/520/1188/sch_bpf.pdf > > Here is a summary of design decisions I made: > > 1. Avoid eBPF struct_ops, as it would be really hard to program > a Qdisc with this approach. Please explain more on this. What is currently missing to make qdisc in struct_ops possible? > 2. Avoid exposing skb's to user-space, which means we can't introduce > a map to store skb's. Instead, store them in kernel without exposure > to user-space. > > So I choose to use priority queues to store skb's inside a > flow and to store flows inside a Qdisc, and let eBPF programs > decide the *relative* position of the skb within the flow and the > *relative* order of the flows too, upon each enqueue and dequeue. > Each flow is also exposed to user as a TC class, like many other > classful Qdisc's. > > Although the biggest limitation is obviously that users can > not traverse the packets or flows inside the Qdisc, I think > at least they could store those global information of interest > inside their own map and map can be shared between enqueue and > dequeue. For example, users could use skb pointer as key and > rank as a value to find out the absolute order. > > One of the challeges is how to interact with existing TC infra, > for instance, if users install TC filters on this Qdisc, should > we respect this by ignoring or rejecting eBPF enqueue program > attached or vice versa? Should we allow users to replace each > priority queue of a class with a regular Qdisc? > > Any high-level feedbacks are welcome. Please do not review any > coding details until RFC tag is removed. > > Cc: Jamal Hadi Salim <jhs@xxxxxxxxxxxx> > Cc: Jiri Pirko <jiri@xxxxxxxxxxx> > Signed-off-by: Cong Wang <cong.wang@xxxxxxxxxxxxx>