On Mon, 15 Apr 2019 10:58:07 -0700 "Jonathan Lemon" <jonathan.lemon@xxxxxxxxx> wrote: > On 15 Apr 2019, at 9:32, Jesper Dangaard Brouer wrote: > > > On Mon, 15 Apr 2019 13:59:03 +0200 Björn Töpel > > <bjorn.topel@xxxxxxxxx> wrote: > > > >> Hi, > >> > >> As you probably can derive from the amount of time this is taking, > >> I'm > >> not really satisfied with the design of per-queue XDP program. (That, > >> plus I'm a terribly slow hacker... ;-)) I'll try to expand my > >> thinking > >> in this mail! > >> > >> Beware, it's kind of a long post, and it's all over the place. > > > > Cc'ing all the XDP-maintainers (and netdev). > > > >> There are a number of ways of setting up flows in the kernel, e.g. > >> > >> * Connecting/accepting a TCP socket (in-band) > >> * Using tc-flower (out-of-band) > >> * ethtool (out-of-band) > >> * ... > >> > >> The first acts on sockets, the second on netdevs. Then there's > >> ethtool > >> to configure RSS, and the RSS-on-steriods rxhash/ntuple that can > >> steer > >> to queues. Most users care about sockets and netdevices. Queues is > >> more of an implementation detail of Rx or for QoS on the Tx side. > > > > Let me first acknowledge that the current Linux tools to administrator > > HW filters is lacking (well sucks). We know the hardware is capable, > > as DPDK have an full API for this called rte_flow[1]. If nothing else > > you/we can use the DPDK API to create a program to configure the > > hardware, examples here[2] > > > > [1] https://doc.dpdk.org/guides/prog_guide/rte_flow.html > > [2] https://doc.dpdk.org/guides/howto/rte_flow.html > > > >> XDP is something that we can attach to a netdevice. Again, very > >> natural from a user perspective. As for XDP sockets, the current > >> mechanism is that we attach to an existing netdevice queue. Ideally > >> what we'd like is to *remove* the queue concept. A better approach > >> would be creating the socket and set it up -- but not binding it to a > >> queue. Instead just binding it to a netdevice (or crazier just > >> creating a socket without a netdevice). > > > > Let me just remind everybody that the AF_XDP performance gains comes > > from binding the resource, which allow for lock-free semantics, as > > explained here[3]. > > > > [3] > > https://github.com/xdp-project/xdp-tutorial/tree/master/advanced03-AF_XDP#where-does-af_xdp-performance-come-from > > > > > >> The socket is an endpoint, where I'd like data to end up (or get sent > >> from). If the kernel can attach the socket to a hardware queue, > >> there's zerocopy if not, copy-mode. Dito for Tx. > > > > Well XDP programs per RXQ is just a building block to achieve this. > > > > As Van Jacobson explain[4], sockets or applications "register" a > > "transport signature", and gets back a "channel". In our case, the > > netdev-global XDP program is our way to register/program these > > transport > > signatures and redirect (e.g. into the AF_XDP socket). > > This requires some work in software to parse and match transport > > signatures to sockets. The XDP programs per RXQ is a way to get > > hardware to perform this filtering for us. > > > > [4] http://www.lemis.com/grog/Documentation/vj/lca06vj.pdf > > > > > >> Does a user (control plane) want/need to care about queues? Just > >> create a flow to a socket (out-of-band or inband) or to a netdevice > >> (out-of-band). > > > > A userspace "control-plane" program, could hide the setup and use what > > the system/hardware can provide of optimizations. VJ[4] e.g. suggest > > that the "listen" socket first register the transport signature (with > > the driver) on "accept()". If the HW supports DPDK-rte_flow API we > > can register a 5-tuple (or create TC-HW rules) and load our > > "transport-signature" XDP prog on the queue number we choose. If not, > > when our netdev-global XDP prog need a hash-table with 5-tuple and do > > 5-tuple parsing. > > > > Creating netdevices via HW filter into queues is an interesting idea. > > DPDK have an example here[5], on how to per flow (via ethtool filter > > setup even!) send packets to queues, that endup in SRIOV devices. > > > > [5] https://doc.dpdk.org/guides/howto/flow_bifurcation.html > > > > > >> Do we envison any other uses for per-queue XDP other than AF_XDP? If > >> not, it would make *more* sense to attach the XDP program to the > >> socket (e.g. if the endpoint would like to use kernel data structures > >> via XDP). > > > > As demonstrated in [5] you can use (ethtool) hardware filters to > > redirect packets into VFs (Virtual Functions). > > > > I also want us to extend XDP to allow for redirect from a PF (Physical > > Function) into a VF (Virtual Function). First the netdev-global > > XDP-prog need to support this (maybe extend xdp_rxq_info with PF + VF > > info). Next configure HW filter to queue# and load XDP prog on that > > queue# that only "redirect" to a single VF. Now if driver+HW supports > > it, it can "eliminate" the per-queue XDP-prog and do everything in HW. > > One thing I'd like to see is have RSS distribute incoming traffic > across a set of queues. The application would open a set of xsk's > which are bound to those queues. Yes. (Some) NIC hardware does support this RSS distribute incoming traffic across a set of queues. As you can see in [5] they have an example of this: testpmd> flow isolate 0 true testpmd> flow create 0 ingress pattern eth / ipv4 / udp / vxlan vni is 42 / end \ actions rss queues 0 1 2 3 end / end > I'm not seeing how a transport signature would achieve this. The > current tooling seems to treat the queue as the basic building block, > which seems generally appropriate. After creating N-queue that your RSS-hash distribute over, I imagine that you load your per-queue XDP program on each of these N-queues. I don't necessarily see a need for the kernel API to expose to userspace a API/facility to load an XDP-prog on N-queue in-one-go (you can just iterate over them). > Whittling things down (receiving packets only for a specific flow) > could be achieved by creating a queue which only contains those > packets which atched via some form of classification (or perhaps > steered to a VF device), aka [5] above. Exposing multiple queues > allows load distribution for those apps which care about it. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer