On Tue, Jan 29, 2019 at 9:05 PM David Ahern <dsahern@xxxxxxxxx> wrote: > > On 1/21/19 10:10 AM, Matheus Salgueiro Castanho wrote: > > Hi all, > > > > I've recently begun playing around with BPF and XDP. Right now I'm > > trying to write BPF programs to perform encap/decap of a custom > > protocol I'm using to transport packets over the network, creating > > something like a tunnel. The packet looks like this: > > > > | Outer encapsulation | Original packet | > > > > Upon reception, a program loaded on XDP removes this external > > encapsulation, leaving only the original packet, which is processed by > > the server and then sent back to the network. However, I need to > > re-add this outer encapsulation before sending the packet. > > > > I found out I can do this using L3 using routing rules, like this: > > ip route add <IP-address> encap bpf headroom <size> xmit > > obj <bpf-obj> section <sec-name> dev <iface> > > > > The problem with this approach is that at this level I cannot add my > > tunnel properly, which should come before Ethernet header. And from my > > tests, this does not work with raw sockets (could be mistaken, > > though). > > > > Ideally, this should be done right before the packet reaches the NIC, > > at the lowest layer possible, since the Linux kernel does not > > recognize this external encapsulation. I then would need something > > like "XDP on TX", which as far as I know doesn't exist. > > > > I was able to implement this functionality at the Traffic Control (TC) > > layer, but could not find an equivalent helper funcion to > > bpf_xdp_adjust_head() on TC. I ended up using a "hack", calling > > bpf_skb_vlan_push() several times and reorganizing and re-writing the > > initial headers. No surprise the performance ended up not being really > > good, plus some side-effects. > > > > My question is: is there a better way to do this encapsulation on TX? > > If not, what would be the steps to implement a basic "XDP on TX" on > > the Linux kernel to support this? Although new to kernel programming > > I've spent a good amount of time reading through BPF-related pieces of > > code in the kernel, so I have a basic idea of how things are > > implemented, but would like to hear from more experienced developers > > than me. > > > > Thank you very much, > > Matheus Castanho > > > > Peter is looking at something similar but with the assumption that it is > still ipv4/ipv6 based: > > http://patchwork.ozlabs.org/project/netdev/list/?series=88701 > > I do not believe it is easy to relax the v4/v6 assumption. I believe it is possible to prepend a non-IP L3 header that the kernel understands, such as MPLS, in a manner similar to what I am trying to do in my patches (I may actually look into MPLS encapping in LWT/BPF later). If the encapping protocol is non-standard, I might try using bpf_lwt_push_encap() + bpf_redirect() (with a new BPF_LWT_ENCAP_RAW or something similar). This actually sounds like an interesting and potentially useful route to generalize/expand my IP encapping efforts. > > Another option is to look at MPLS ingressing to a label switch path: > net/mpls/mpls_iptunnel.c > > It does something very similar to what you want -- prepending an MPLS > label stack to a packet and then transmitting. Not bpf/XDP but an option > to consider.