Re: Per-queue XDP programs, thoughts

Jakub Kicinski <jakub.kicinski@xxxxxxxxxxxxx> · Tue, 16 Apr 2019 14:17:59 -0700

On Tue, 16 Apr 2019 09:45:24 +0200, Björn Töpel wrote:
> > > > If we'd like to slice a netdevice into multiple queues. Isn't macvlan
> > > > or similar *virtual* netdevices a better path, instead of introducing
> > > > yet another abstraction?  
> >
> > Yes, the question of use cases is extremely important.  It seems
> > Mellanox is working on "spawning devlink ports" IOW slicing a device
> > into subdevices.  Which is a great way to run bifurcated DPDK/netdev
> > applications :/  If that gets merged I think we have to recalculate
> > what purpose AF_XDP is going to serve, if any.
> 
> I really like the subdevice-think, but let's have the drivers in the
> kernel. I don't see how the XDP view (including AF_XDP) changes with
> subdevices. My view on AF_XDP is that it's a socket that can
> receive/send data efficiently from/to the kernel. What subdevice
> *might* change is the requirement for a per-queue XDP program.

My worry is that the sockets are not expressive enough.  You can't have
a flower rule that forwards to a socket.  You can't have a flower rule
which forwards to an RSS context (AFAIK).  We have a model for doing
those things with port netdevs (A(incorrectly)KA representors).

> > > That is actually the reason I want XDP per-queue, as it is a way to
> > > offload the filtering to the hardware.  And if the per-queue XDP-prog
> > > becomes simple enough, the hardware can eliminate and do everything in
> > > hardware (hopefully).
> > >  
> > > > The control plane should IMO be outside of the XDP program.  
> >
> > ENOCOMPUTE :)  XDP program is the BPF byte code, it's never control
> > plance.  Do you mean application should not control the "context/
> > channel/subdev" creation?  
> 
> Yes, but I'm not sure. I'd like to hear more opinions.
> 
> Let me try to think out loud here. Say that per-queue XDP programs
> exist. The main XDP program receives all packets and makes the
> decision that a certain flow should end up in say queue X, and that
> the hardware supports offloading that. Should the knobs to program the
> hardware be in via BPF or by some other mechanism (perf ring to
> userland daemon)? Further, setting the XDP program per queue. Should
> that be done via XDP (the main XDP program has knowledge of all
> programs) or via say netlink (as XDP is today). One could argue that
> the per-queue setup should be a map (like tail-calls).

This is a philosophical discussion reminiscent of Saeed's control map
proposal.

I don't like the idea of purposefully shoehorning the networking
configuration into special maps.  It should probably be judged on
patch-by-patch basis, tho.

> > You're not saying "it's not the XDP program
> > which should be making the classification", no?  XDP program
> > controlling the classification was _the_ reason why we liked AF_XDP :)  
> 
> XDP program not doing classification would be weird. But if there's a
> scenario where *everything for a certain HW filter* end up in an
> AF_XDP queue, should we require an XDP program. I've been going back
> and forth here... :-)