On Wed, Apr 19, 2017 at 09:38:44PM -0700, John Fastabend wrote: > On 17-04-19 07:56 PM, Alexei Starovoitov wrote: > > On Thu, Apr 20, 2017 at 12:51:31AM +0200, Daniel Borkmann wrote: > >> > >> Is there a concrete reason that all the proposed future cases like sockets > >> have to be handled within the very same XDP_REDIRECT return code? F.e. why > >> not XDP_TX_NIC that only assumes ifindex as proposed in the patch, and future > >> ones would get a different return code f.e. XDP_TX_SK only handling sockets > >> when we get there implementation-wise? > > > > yeah. let's keep redirect to sockets, tunnels, crypto and exotic things > > out of this discussion. > > XDP_REDIRECT should assume L2 raw packet is being redirected to another L2 netdev. > > If we make it too generic it will lose performance. > > > > For cls_bpf the ifindex concept is symmetric. The program can access it as > > skb->ifindex on receive and can redirect to another ifindex via bpf_redirect() helper. > > Since netdev is not locked, it's actually big plus, since container management > > control plane can simply delete netns+veth and it goes away. The program can > > have dangling ifindex (if control plane is buggy and didn't update the bpf side), > > but it's harmless. Packets that redirect to non-existing ifindex are dropped. > > This approach already understood and works well, so for XDP I suggest to use > > the same approach initially before starting to reinvent the wheel. > > struct xdp_md needs ifindex field and xdp_redirect() helper that redirects > > to L2 netdev only. That's it. Simple and easy. > > I think the main use cases in John's and Jesper's minds is something like > > xdpswitch where packets are coming from VMs and from physical eths and > > being redirected to either physical eth or to VM via upcoming vhost+xdp support. > > This xdp_md->ifindex + xdp_redirect(to_ifindex) will solve it just fine. > > hmm I must be missing something, bpf_redirect() helper should be used as a > return statement, e.g. > > return bpf_redirect(ifindex, flags); > > Its a bit awkward to use in any other way. You would have to ensure the > program returns TC_ACT_REDIRECT in all cases I presume. It seems incredibly > fragile and gains nothing as far as I can tell. bpf_redirect uses per_cpu > data and expects the core to call skb_do_redirect() to push the packet into > the correct netdev. bpf_redirect() does not modify the skb->ifindex value, > looking at the code now. > > Are you suggesting using xdp_md to store the ifindex value instead of a per > cpu variable in the redirect helper? no. i'm suggesting to store in per-cpu scratch area just like cls_bpf does. > Do you really mean the xdp_md struct in > the uapi headers? yes. since 'ifindex' needs to be part of xdp_md struct in read-only way. Just like in cls_bpf does it. Otherwise if we attach the same program to multiple taps it won't know which tap the traffic arriving on and won't be able to redirect properly. > I don't see why it needs to be in the UAPI at all. If we > don't like per cpu variables it could be pushed as part of xdp_buff I guess. It's not about like or dont-like per-cpu scratch area. My main point: it works just fine for cls_bpf and i'm suggesting to do the same for xdp_redirect, since no one ever complained about that bit of cls_bpf. > My suggestion is we could add an ifindex to the xdp_md structure and have the > receiving NIC driver populate the value assuming it is useful to programs. But, > if we use cls_bpf as a model then xdp_md ifindex is more or less independent of > the redirect helper. exactly. arriving ifindex is independent of xdp_redirect helper. > In my opinion we should avoid diverging cls bpf and xdp bpf > in subtle ways like handling of ifindex and redirect. exactly. I'm saying the same thing. I'm not sure which part of my proposal was so confusing. Sorry about that.