On Sun, Apr 09, 2017 at 01:35:28PM -0700, David Miller wrote: > > This provides a generic non-optimized XDP implementation when the > device driver does not provide an optimized one. > > It is arguable that perhaps I should have required something like > this as part of the initial XDP feature merge. > > I believe this is critical for two reasons: > > 1) Accessibility. More people can play with XDP with less > dependencies. Yes I know we have XDP support in virtio_net, but > that just creates another depedency for learning how to use this > facility. > > I wrote this to make life easier for the XDP newbies. > > 2) As a model for what the expected semantics are. If there is a pure > generic core implementation, it serves as a semantic example for > driver folks adding XDP support. > > This is just a rough draft and is untested. > > Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> ... > +static u32 netif_receive_generic_xdp(struct sk_buff *skb, > + struct bpf_prog *xdp_prog) > +{ > + struct xdp_buff xdp; > + u32 act = XDP_DROP; > + void *orig_data; > + int hlen, off; > + > + if (skb_linearize(skb)) > + goto do_drop; do we need to force disable gro ? Otherwise if we linearize skb_is_gso packet it will be huge and not similar to normal xdp packets? gso probably needs to disabled too to avoid veth surprises? > + > + hlen = skb_headlen(skb); > + xdp.data = skb->data; it probably should be hlen = skb_headlen(skb) + skb->mac_len; xdp.data = skb->data - skb->mac_len; to make sure xdp program is looking at l2 header. > + xdp.data_end = xdp.data + hlen; > + xdp.data_hard_start = xdp.data - skb_headroom(skb); > + orig_data = xdp.data; > + act = bpf_prog_run_xdp(xdp_prog, &xdp); > + > + off = xdp.data - orig_data; > + if (off) > + __skb_push(skb, off); and restore l2 back somehow and get new skb->protocol ? if we simply do __skb_pull(skb, skb->mac_len); like we do with cls_bpf, it will not work correctly, since if the program did ip->ipip encap (like our balancer does and the test tools/testing/selftests/bpf/test_xdp.c) the skb metadata fields will be wrong. So we need to repeat eth_type_trans() here if (xdp.data != orig_data) In case of cls_bpf when we mess with skb sizes we always adjust skb metafields in helpers, so there it's fine and __skb_pull(skb, skb->mac_len); is enough. Here we need to be a bit more careful. > static int netif_receive_skb_internal(struct sk_buff *skb) > { > int ret; > @@ -4258,6 +4336,21 @@ static int netif_receive_skb_internal(struct sk_buff *skb) > > rcu_read_lock(); > > + if (static_key_false(&generic_xdp_needed)) { > + struct bpf_prog *xdp_prog = rcu_dereference(skb->dev->xdp_prog); > + > + if (xdp_prog) { > + u32 act = netif_receive_generic_xdp(skb, xdp_prog); That's indeed the best attachment point in the stack. I was trying to see whether it can be lowered into something like dev_gro_receive(), but not everyone calls it. Another option to put it into eth_type_trans() itself, then there are no problems with gro, l2 headers, and adjust_head, but changing all drivers is too much. > + > + if (act != XDP_PASS) { > + rcu_read_unlock(); > + if (act == XDP_TX) > + dev_queue_xmit(skb); It should be fine. For cls_bpf we do recursion check __bpf_tx_skb() but I forgot specific details. May be here it's fine as-is. Daniel, do we need recursion check here? > @@ -6725,14 +6819,16 @@ int dev_change_xdp_fd(struct net_device *dev, int fd, u32 flags) > > ASSERT_RTNL(); > > - if (!ops->ndo_xdp) > - return -EOPNOTSUPP; > + xdp_op = ops->ndo_xdp; > + if (!xdp_op) > + xdp_op = generic_xdp_install; I suspect there always be drivers that don't support xdp (like e100), so this generic_xdp_install() will stay with us forever. Since it will stay, can we enable it for xdp enabled drivers too? This will allow us to test both raw xdp and skb-based paths neck to neck. Today bpf-newbies typically start developing with cls_bpf, since it can run on tap/veth and then refactor the program to xdp. Unfortunately cls_bpf and xdp programs are substantially different, so it pretty much means that cls_bpf prog is a throw away. If we can add a flag to this xdp netlink attach command that says 'enable skb-based xdp', we'll have exactly the same program running on raw dma buffer and on skb, which will help developing on veth/tap and moving the same prog into physical eth0 later. And users will be able to switch between skb-based mode on eth0 and raw-buffer mode back and forth to see perf difference (and hopefully nothing else). Another advantage that it will help to flush out the differences between skb- and raw- modes in the drivers that support xdp already. Yet another benefit is it will allow measuring of cost of skb-alloc path. Right now we have XDP_FLAGS_UPDATE_IF_NOEXIST flag. We can add something like XDP_FLAGS_SKB_MODE flag for this purpose and in the drivers that don't support XDP at the moment, this flag will be assumed automatically. Thoughts?