Re: [RFC PATCH 00/17] xdp: Add packet queueing and scheduling capabilities

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Cong Wang <xiyou.wangcong@xxxxxxxxx> writes:

> On Thu, Jul 14, 2022 at 12:46:54PM +0200, Toke Høiland-Jørgensen wrote:
>> Stanislav Fomichev <sdf@xxxxxxxxxx> writes:
>> 
>> > On Wed, Jul 13, 2022 at 2:52 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>> >>
>> >> Stanislav Fomichev <sdf@xxxxxxxxxx> writes:
>> >>
>> >> > On Wed, Jul 13, 2022 at 4:14 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>> >> >>
>> >> >> Packet forwarding is an important use case for XDP, which offers
>> >> >> significant performance improvements compared to forwarding using the
>> >> >> regular networking stack. However, XDP currently offers no mechanism to
>> >> >> delay, queue or schedule packets, which limits the practical uses for
>> >> >> XDP-based forwarding to those where the capacity of input and output links
>> >> >> always match each other (i.e., no rate transitions or many-to-one
>> >> >> forwarding). It also prevents an XDP-based router from doing any kind of
>> >> >> traffic shaping or reordering to enforce policy.
>> >> >>
>> >> >> This series represents a first RFC of our attempt to remedy this lack. The
>> >> >> code in these patches is functional, but needs additional testing and
>> >> >> polishing before being considered for merging. I'm posting it here as an
>> >> >> RFC to get some early feedback on the API and overall design of the
>> >> >> feature.
>> >> >>
>> >> >> DESIGN
>> >> >>
>> >> >> The design consists of three components: A new map type for storing XDP
>> >> >> frames, a new 'dequeue' program type that will run in the TX softirq to
>> >> >> provide the stack with packets to transmit, and a set of helpers to dequeue
>> >> >> packets from the map, optionally drop them, and to schedule an interface
>> >> >> for transmission.
>> >> >>
>> >> >> The new map type is modelled on the PIFO data structure proposed in the
>> >> >> literature[0][1]. It represents a priority queue where packets can be
>> >> >> enqueued in any priority, but is always dequeued from the head. From the
>> >> >> XDP side, the map is simply used as a target for the bpf_redirect_map()
>> >> >> helper, where the target index is the desired priority.
>> >> >
>> >> > I have the same question I asked on the series from Cong:
>> >> > Any considerations for existing carousel/edt-like models?
>> >>
>> >> Well, the reason for the addition in patch 5 (continuously increasing
>> >> priorities) is exactly to be able to implement EDT-like behaviour, where
>> >> the priority is used as time units to clock out packets.
>> >
>> > Ah, ok, I didn't read the patches closely enough. I saw some limits
>> > for the ranges and assumed that it wasn't capable of efficiently
>> > storing 64-bit timestamps..
>> 
>> The goal is definitely to support full 64-bit priorities. Right now you
>> have to start out at 0 but can go on for a full 64 bits, but that's a
>> bit of an API wart that I'd like to get rid of eventually...
>> 
>> >> > Can we make the map flexible enough to implement different qdisc
>> >> > policies?
>> >>
>> >> That's one of the things we want to be absolutely sure about. We are
>> >> starting out with the PIFO map type because the literature makes a good
>> >> case that it is flexible enough to implement all conceivable policies.
>> >> The goal of the test harness linked as note [4] is to actually examine
>> >> this; Frey is our PhD student working on this bit.
>> >>
>> >> Thus far we haven't hit any limitations on this, but we'll need to add
>> >> more policies before we are done with this. Another consideration is
>> >> performance, of course, so we're also planning to do a comparison with a
>> >> more traditional "bunch of FIFO queues" type data structure for at least
>> >> a subset of the algorithms. Kartikeya also had an idea for an
>> >> alternative way to implement a priority queue using (semi-)lockless
>> >> skiplists, which may turn out to perform better.
>> >>
>> >> If there's any particular policy/algorithm you'd like to see included in
>> >> this evaluation, please do let us know, BTW! :)
>> >
>> > I honestly am not sure what the bar for accepting this should be. But
>> > on the Cong's series I mentioned Martin's CC bpf work as a great
>> > example of what we should be trying to do for qdisc-like maps. Having
>> > a bpf version of fq/fq_codel/whatever_other_complex_qdisc might be
>> > very convincing :-)
>> 
>> Just doing flow queueing is quite straight forward with PIFOs. We're
>> working on fq_codel. Personally I also want to implement something that
>> has feature parity with sch_cake (which includes every feature and the
>> kitchen sink already) :)
>
> And how exactly would you plan to implement Least Slack Time First with
> PIFOs?  See https://www.usenix.org/system/files/nsdi20-paper-sharma.pdf.
> Can you be as specific as possible ideally with a pesudo code?

By sticking flows into the PIFO instead of individual packets.
Basically:

enqueue:

flow_id = hash_pkt(pkt);
flow_pifo = &flows[flow_id];
pifo_enqueue(flow_pifo, pkt, 0); // always enqueue at rank 0, so effectively a FIFO
pifo_enqueue(toplevel_pifo, flow_id, compute_rank(flow_id));

dequeue:

flow_id = pifo_dequeue(toplevel_pifo);
flow_pifo = &flows[flow_id];
pkt = pifo_dequeue(flow_pifo);
pifo_enqueue(toplevel_pifo, flow_id, compute_rank(flow_id)); // re-enqueue
return pkt;


We have not gotten around to doing a full implementation of this yet,
but SRPT/LSTF is on our list of algorithms to add :)

> BTW, this is very easy to do with my approach as no FO limitations.

How does being able to dequeue out-of-order actually help with this
particular scheme? On dequeue you still process things in priority
order?

-Toke





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux