Cong Wang <xiyou.wangcong@xxxxxxxxx> writes: > On Thu, Jul 14, 2022 at 12:46:54PM +0200, Toke Høiland-Jørgensen wrote: >> Stanislav Fomichev <sdf@xxxxxxxxxx> writes: >> >> > On Wed, Jul 13, 2022 at 2:52 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: >> >> >> >> Stanislav Fomichev <sdf@xxxxxxxxxx> writes: >> >> >> >> > On Wed, Jul 13, 2022 at 4:14 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: >> >> >> >> >> >> Packet forwarding is an important use case for XDP, which offers >> >> >> significant performance improvements compared to forwarding using the >> >> >> regular networking stack. However, XDP currently offers no mechanism to >> >> >> delay, queue or schedule packets, which limits the practical uses for >> >> >> XDP-based forwarding to those where the capacity of input and output links >> >> >> always match each other (i.e., no rate transitions or many-to-one >> >> >> forwarding). It also prevents an XDP-based router from doing any kind of >> >> >> traffic shaping or reordering to enforce policy. >> >> >> >> >> >> This series represents a first RFC of our attempt to remedy this lack. The >> >> >> code in these patches is functional, but needs additional testing and >> >> >> polishing before being considered for merging. I'm posting it here as an >> >> >> RFC to get some early feedback on the API and overall design of the >> >> >> feature. >> >> >> >> >> >> DESIGN >> >> >> >> >> >> The design consists of three components: A new map type for storing XDP >> >> >> frames, a new 'dequeue' program type that will run in the TX softirq to >> >> >> provide the stack with packets to transmit, and a set of helpers to dequeue >> >> >> packets from the map, optionally drop them, and to schedule an interface >> >> >> for transmission. >> >> >> >> >> >> The new map type is modelled on the PIFO data structure proposed in the >> >> >> literature[0][1]. It represents a priority queue where packets can be >> >> >> enqueued in any priority, but is always dequeued from the head. From the >> >> >> XDP side, the map is simply used as a target for the bpf_redirect_map() >> >> >> helper, where the target index is the desired priority. >> >> > >> >> > I have the same question I asked on the series from Cong: >> >> > Any considerations for existing carousel/edt-like models? >> >> >> >> Well, the reason for the addition in patch 5 (continuously increasing >> >> priorities) is exactly to be able to implement EDT-like behaviour, where >> >> the priority is used as time units to clock out packets. >> > >> > Ah, ok, I didn't read the patches closely enough. I saw some limits >> > for the ranges and assumed that it wasn't capable of efficiently >> > storing 64-bit timestamps.. >> >> The goal is definitely to support full 64-bit priorities. Right now you >> have to start out at 0 but can go on for a full 64 bits, but that's a >> bit of an API wart that I'd like to get rid of eventually... >> >> >> > Can we make the map flexible enough to implement different qdisc >> >> > policies? >> >> >> >> That's one of the things we want to be absolutely sure about. We are >> >> starting out with the PIFO map type because the literature makes a good >> >> case that it is flexible enough to implement all conceivable policies. >> >> The goal of the test harness linked as note [4] is to actually examine >> >> this; Frey is our PhD student working on this bit. >> >> >> >> Thus far we haven't hit any limitations on this, but we'll need to add >> >> more policies before we are done with this. Another consideration is >> >> performance, of course, so we're also planning to do a comparison with a >> >> more traditional "bunch of FIFO queues" type data structure for at least >> >> a subset of the algorithms. Kartikeya also had an idea for an >> >> alternative way to implement a priority queue using (semi-)lockless >> >> skiplists, which may turn out to perform better. >> >> >> >> If there's any particular policy/algorithm you'd like to see included in >> >> this evaluation, please do let us know, BTW! :) >> > >> > I honestly am not sure what the bar for accepting this should be. But >> > on the Cong's series I mentioned Martin's CC bpf work as a great >> > example of what we should be trying to do for qdisc-like maps. Having >> > a bpf version of fq/fq_codel/whatever_other_complex_qdisc might be >> > very convincing :-) >> >> Just doing flow queueing is quite straight forward with PIFOs. We're >> working on fq_codel. Personally I also want to implement something that >> has feature parity with sch_cake (which includes every feature and the >> kitchen sink already) :) > > And how exactly would you plan to implement Least Slack Time First with > PIFOs? See https://www.usenix.org/system/files/nsdi20-paper-sharma.pdf. > Can you be as specific as possible ideally with a pesudo code? By sticking flows into the PIFO instead of individual packets. Basically: enqueue: flow_id = hash_pkt(pkt); flow_pifo = &flows[flow_id]; pifo_enqueue(flow_pifo, pkt, 0); // always enqueue at rank 0, so effectively a FIFO pifo_enqueue(toplevel_pifo, flow_id, compute_rank(flow_id)); dequeue: flow_id = pifo_dequeue(toplevel_pifo); flow_pifo = &flows[flow_id]; pkt = pifo_dequeue(flow_pifo); pifo_enqueue(toplevel_pifo, flow_id, compute_rank(flow_id)); // re-enqueue return pkt; We have not gotten around to doing a full implementation of this yet, but SRPT/LSTF is on our list of algorithms to add :) > BTW, this is very easy to do with my approach as no FO limitations. How does being able to dequeue out-of-order actually help with this particular scheme? On dequeue you still process things in priority order? -Toke