On Thu, Jul 14, 2022 at 12:46:54PM +0200, Toke Høiland-Jørgensen wrote: > Stanislav Fomichev <sdf@xxxxxxxxxx> writes: > > > On Wed, Jul 13, 2022 at 2:52 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > >> > >> Stanislav Fomichev <sdf@xxxxxxxxxx> writes: > >> > >> > On Wed, Jul 13, 2022 at 4:14 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > >> >> > >> >> Packet forwarding is an important use case for XDP, which offers > >> >> significant performance improvements compared to forwarding using the > >> >> regular networking stack. However, XDP currently offers no mechanism to > >> >> delay, queue or schedule packets, which limits the practical uses for > >> >> XDP-based forwarding to those where the capacity of input and output links > >> >> always match each other (i.e., no rate transitions or many-to-one > >> >> forwarding). It also prevents an XDP-based router from doing any kind of > >> >> traffic shaping or reordering to enforce policy. > >> >> > >> >> This series represents a first RFC of our attempt to remedy this lack. The > >> >> code in these patches is functional, but needs additional testing and > >> >> polishing before being considered for merging. I'm posting it here as an > >> >> RFC to get some early feedback on the API and overall design of the > >> >> feature. > >> >> > >> >> DESIGN > >> >> > >> >> The design consists of three components: A new map type for storing XDP > >> >> frames, a new 'dequeue' program type that will run in the TX softirq to > >> >> provide the stack with packets to transmit, and a set of helpers to dequeue > >> >> packets from the map, optionally drop them, and to schedule an interface > >> >> for transmission. > >> >> > >> >> The new map type is modelled on the PIFO data structure proposed in the > >> >> literature[0][1]. It represents a priority queue where packets can be > >> >> enqueued in any priority, but is always dequeued from the head. From the > >> >> XDP side, the map is simply used as a target for the bpf_redirect_map() > >> >> helper, where the target index is the desired priority. > >> > > >> > I have the same question I asked on the series from Cong: > >> > Any considerations for existing carousel/edt-like models? > >> > >> Well, the reason for the addition in patch 5 (continuously increasing > >> priorities) is exactly to be able to implement EDT-like behaviour, where > >> the priority is used as time units to clock out packets. > > > > Ah, ok, I didn't read the patches closely enough. I saw some limits > > for the ranges and assumed that it wasn't capable of efficiently > > storing 64-bit timestamps.. > > The goal is definitely to support full 64-bit priorities. Right now you > have to start out at 0 but can go on for a full 64 bits, but that's a > bit of an API wart that I'd like to get rid of eventually... > > >> > Can we make the map flexible enough to implement different qdisc > >> > policies? > >> > >> That's one of the things we want to be absolutely sure about. We are > >> starting out with the PIFO map type because the literature makes a good > >> case that it is flexible enough to implement all conceivable policies. > >> The goal of the test harness linked as note [4] is to actually examine > >> this; Frey is our PhD student working on this bit. > >> > >> Thus far we haven't hit any limitations on this, but we'll need to add > >> more policies before we are done with this. Another consideration is > >> performance, of course, so we're also planning to do a comparison with a > >> more traditional "bunch of FIFO queues" type data structure for at least > >> a subset of the algorithms. Kartikeya also had an idea for an > >> alternative way to implement a priority queue using (semi-)lockless > >> skiplists, which may turn out to perform better. > >> > >> If there's any particular policy/algorithm you'd like to see included in > >> this evaluation, please do let us know, BTW! :) > > > > I honestly am not sure what the bar for accepting this should be. But > > on the Cong's series I mentioned Martin's CC bpf work as a great > > example of what we should be trying to do for qdisc-like maps. Having > > a bpf version of fq/fq_codel/whatever_other_complex_qdisc might be > > very convincing :-) > > Just doing flow queueing is quite straight forward with PIFOs. We're > working on fq_codel. Personally I also want to implement something that > has feature parity with sch_cake (which includes every feature and the > kitchen sink already) :) And how exactly would you plan to implement Least Slack Time First with PIFOs? See https://www.usenix.org/system/files/nsdi20-paper-sharma.pdf. Can you be as specific as possible ideally with a pesudo code? BTW, this is very easy to do with my approach as no FO limitations. Thanks!