On Sat, 15 Aug 2020 09:41:02 +0200 "Jason A. Donenfeld" <Jason@xxxxxxxxx> wrote: > A user reported that packets from wireguard were possibly ignored by XDP > [1]. Another user reported that modifying packets from layer 3 > interfaces results in impossible to diagnose drops. > > Apparently, the generic skb xdp handler path seems to assume that > packets will always have an ethernet header, which really isn't always > the case for layer 3 packets, which are produced by multiple drivers. > This patch fixes the oversight. If the mac_len is 0 and so is > hard_header_len, then we know that the skb is a layer 3 packet, and in > that case prepend a pseudo ethhdr to the packet whose h_proto is copied > from skb->protocol, which will have the appropriate v4 or v6 ethertype. > This allows us to keep XDP programs' assumption correct about packets > always having that ethernet header, so that existing code doesn't break, > while still allowing layer 3 devices to use the generic XDP handler. > > We push on the ethernet header and then pull it right off and set > mac_len to the ethernet header size, so that the rest of the XDP code > does not need any changes. That is, it makes it so that the skb has its > ethernet header just before the data pointer, of size ETH_HLEN. > > Previous discussions have included the point that maybe XDP should just > be intentionally broken on layer 3 interfaces, by design, and that layer > 3 people should be using cls_bpf. However, I think there are good > grounds to reconsider this perspective: > > - Complicated deployments wind up applying XDP modifications to a > variety of different devices on a given host, some of which are using > specialized ethernet cards and other ones using virtual layer 3 > interfaces, such as WireGuard. Being able to apply one codebase to > each of these winds up being essential. > > - cls_bpf does not support the same feature set as XDP, and operates at > a slightly different stage in the networking stack. You may reply, > "then add all the features you want to cls_bpf", but that seems to be > missing the point, and would still result in there being two ways to > do everything, which is not desirable for anyone actually _using_ this > code. > > - While XDP was originally made for hardware offloading, and while many > look disdainfully upon the generic mode, it nevertheless remains a > highly useful and popular way of adding bespoke packet > transformations, and from that perspective, a difference between layer > 2 and layer 3 packets is immaterial if the user is primarily concerned > with transformations to layer 3 and beyond. > > - It's not impossible to imagine layer 3 hardware (e.g. a WireGuard PCIe > card) including eBPF/XDP functionality built-in. In that case, why > limit XDP as a technology to only layer 2? Then, having generic XDP > work for layer 3 would naturally fit as well. > > [1] https://lore.kernel.org/wireguard/M5WzVK5--3-2@xxxxxxx/ > > Reported-by: Thomas Ptacek <thomas@xxxxxxxxxxxxxx> > Reported-by: Adhipati Blambangan <adhipati@xxxxxxx> > Cc: David Ahern <dsahern@xxxxxxxxx> > Cc: Toke Høiland-Jørgensen <toke@xxxxxxxxxx> > Cc: Jakub Kicinski <kuba@xxxxxxxxxx> > Cc: Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> > Cc: Jesper Dangaard Brouer <brouer@xxxxxxxxxx> > Cc: John Fastabend <john.fastabend@xxxxxxxxx> > Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > Cc: David S. Miller <davem@xxxxxxxxxxxxx> > Signed-off-by: Jason A. Donenfeld <Jason@xxxxxxxxx> > --- > > I had originally dropped this patch, but the issue kept coming up in > user reports, so here's a v4 of it. Testing of it is still rather slim, > but hopefully that will change in the coming days. > > Changes v5->v6: > - The fix to the skb->protocol changing case is now in a separate > stand-alone patch, and removed from this one, so that it can be > evaluated separately. > > Changes v4->v5: > - Rather than tracking in a messy manner whether the skb is l3, we just > do the check once, and then adjust the skb geometry to be identical to > the l2 case. This simplifies the code quite a bit. > - Fix a preexisting bug where the l2 header remained attached if > skb->protocol was updated. > > Changes v3->v4: > - We now preserve the same logic for XDP_TX/XDP_REDIRECT as before. > - hard_header_len is checked in addition to mac_len. > > net/core/dev.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 151f1651439f..79c15f4244e6 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4630,6 +4630,18 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb, > * header. > */ > mac_len = skb->data - skb_mac_header(skb); > + if (!mac_len && !skb->dev->hard_header_len) { > + /* For l3 packets, we push on a fake mac header, and then > + * pull it off again, so that it has the same skb geometry > + * as for the l2 case. > + */ > + eth = skb_push(skb, ETH_HLEN); > + eth_zero_addr(eth->h_source); > + eth_zero_addr(eth->h_dest); > + eth->h_proto = skb->protocol; > + __skb_pull(skb, ETH_HLEN); > + mac_len = ETH_HLEN; > + } You are consuming a little bit of the headroom here. > hlen = skb_headlen(skb) + mac_len; > xdp->data = skb->data - mac_len; > xdp->data_meta = xdp->data; The XDP-prog is allowed to change eth->h_proto. Later (in code) we detect this and update skb->protocol with the new protocol. What will happen if my XDP-prog adds a VLAN header? The selftest tools/testing/selftests/bpf/test_xdp_vlan.sh test these situations. You can use it as an example, and write/construct a test that does the same for your Wireguard devices. As minimum you need to provide such a selftest together with this patch. Generally speaking, IMHO generic-XDP was a mistake, because it is hard to maintain and slows down the development of XDP. (I have a number of fixes in my TODO backlog for generic-XDP). Adding this will just give us more corner-cases that need to be maintained. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer