On 15 Apr 2019, at 9:32, Jesper Dangaard Brouer wrote:
On Mon, 15 Apr 2019 13:59:03 +0200 Björn Töpel
<bjorn.topel@xxxxxxxxx> wrote:
Hi,
As you probably can derive from the amount of time this is taking,
I'm
not really satisfied with the design of per-queue XDP program. (That,
plus I'm a terribly slow hacker... ;-)) I'll try to expand my
thinking
in this mail!
Beware, it's kind of a long post, and it's all over the place.
Cc'ing all the XDP-maintainers (and netdev).
There are a number of ways of setting up flows in the kernel, e.g.
* Connecting/accepting a TCP socket (in-band)
* Using tc-flower (out-of-band)
* ethtool (out-of-band)
* ...
The first acts on sockets, the second on netdevs. Then there's
ethtool
to configure RSS, and the RSS-on-steriods rxhash/ntuple that can
steer
to queues. Most users care about sockets and netdevices. Queues is
more of an implementation detail of Rx or for QoS on the Tx side.
Let me first acknowledge that the current Linux tools to administrator
HW filters is lacking (well sucks). We know the hardware is capable,
as DPDK have an full API for this called rte_flow[1]. If nothing else
you/we can use the DPDK API to create a program to configure the
hardware, examples here[2]
[1] https://doc.dpdk.org/guides/prog_guide/rte_flow.html
[2] https://doc.dpdk.org/guides/howto/rte_flow.html
XDP is something that we can attach to a netdevice. Again, very
natural from a user perspective. As for XDP sockets, the current
mechanism is that we attach to an existing netdevice queue. Ideally
what we'd like is to *remove* the queue concept. A better approach
would be creating the socket and set it up -- but not binding it to a
queue. Instead just binding it to a netdevice (or crazier just
creating a socket without a netdevice).
Let me just remind everybody that the AF_XDP performance gains comes
from binding the resource, which allow for lock-free semantics, as
explained here[3].
[3]
https://github.com/xdp-project/xdp-tutorial/tree/master/advanced03-AF_XDP#where-does-af_xdp-performance-come-from
The socket is an endpoint, where I'd like data to end up (or get sent
from). If the kernel can attach the socket to a hardware queue,
there's zerocopy if not, copy-mode. Dito for Tx.
Well XDP programs per RXQ is just a building block to achieve this.
As Van Jacobson explain[4], sockets or applications "register" a
"transport signature", and gets back a "channel". In our case, the
netdev-global XDP program is our way to register/program these
transport
signatures and redirect (e.g. into the AF_XDP socket).
This requires some work in software to parse and match transport
signatures to sockets. The XDP programs per RXQ is a way to get
hardware to perform this filtering for us.
[4] http://www.lemis.com/grog/Documentation/vj/lca06vj.pdf
Does a user (control plane) want/need to care about queues? Just
create a flow to a socket (out-of-band or inband) or to a netdevice
(out-of-band).
A userspace "control-plane" program, could hide the setup and use what
the system/hardware can provide of optimizations. VJ[4] e.g. suggest
that the "listen" socket first register the transport signature (with
the driver) on "accept()". If the HW supports DPDK-rte_flow API we
can register a 5-tuple (or create TC-HW rules) and load our
"transport-signature" XDP prog on the queue number we choose. If not,
when our netdev-global XDP prog need a hash-table with 5-tuple and do
5-tuple parsing.
Creating netdevices via HW filter into queues is an interesting idea.
DPDK have an example here[5], on how to per flow (via ethtool filter
setup even!) send packets to queues, that endup in SRIOV devices.
[5] https://doc.dpdk.org/guides/howto/flow_bifurcation.html
Do we envison any other uses for per-queue XDP other than AF_XDP? If
not, it would make *more* sense to attach the XDP program to the
socket (e.g. if the endpoint would like to use kernel data structures
via XDP).
As demonstrated in [5] you can use (ethtool) hardware filters to
redirect packets into VFs (Virtual Functions).
I also want us to extend XDP to allow for redirect from a PF (Physical
Function) into a VF (Virtual Function). First the netdev-global
XDP-prog need to support this (maybe extend xdp_rxq_info with PF + VF
info). Next configure HW filter to queue# and load XDP prog on that
queue# that only "redirect" to a single VF. Now if driver+HW supports
it, it can "eliminate" the per-queue XDP-prog and do everything in HW.
One thing I'd like to see is have RSS distribute incoming traffic
across a set of queues. The application would open a set of xsk's which
are bound to those queues.
I'm not seeing how a transport signature would achieve this. The
current
tooling seems to treat the queue as the basic building block, which
seems
generally appropriate.
Whittling things down (receiving packets only for a specific flow) could
be achieved by creating a queue which only contains those packets which
atched via some form of classification (or perhaps steered to a VF
device),
aka [5] above. Exposing multiple queues allows load distribution for
those apps which care about it.
--
Jonathan