On Thu, May 16, 2019 at 09:41:34AM +0100, Lorenz Bauer wrote: > On Wed, 15 May 2019 at 18:16, Joe Stringer <joe@xxxxxxxxxxxxx> wrote: > > > > On Wed, May 15, 2019 at 8:11 AM Lorenz Bauer <lmb@xxxxxxxxxxxxxx> wrote: > > > > > > In the BPF-based TPROXY session with Joe Stringer [1], I mentioned > > > that the sk_lookup_* helpers currently return inconsistent results if > > > SK_REUSEPORT programs are in play. > > > > > > SK_REUSEPORT programs are a hook point in inet_lookup. They get access > > > to the full packet > > > that triggered the look up. To support this, inet_lookup gained a new > > > skb argument to provide such context. If skb is NULL, the SK_REUSEPORT > > > program is skipped and instead the socket is selected by its hash. > > > > > > The first problem is that not all callers to inet_lookup from BPF have > > > an skb, e.g. XDP. This means that a look up from XDP gives an > > > incorrect result. For now that is not a huge problem. However, once we > > > get sk_assign as proposed by Joe, we can end up circumventing > > > SK_REUSEPORT. > > > > To clarify a bit, the reason this is a problem is that a > > straightforward implementation may just consider passing the skb > > context into the sk_lookup_*() and through to the inet_lookup() so > > that it would run the SK_REUSEPORT BPF program for socket selection on > > the skb when the packet-path BPF program performs the socket lookup. > > However, as this paragraph describes, the skb context is not always > > available. > > > > > At the conference, someone suggested using a similar approach to the > > > work done on the flow dissector by Stanislav: create a dedicated > > > context sk_reuseport which can either take an skb or a plain pointer. > > > Patch up load_bytes to deal with both. Pass the context to > > > inet_lookup. > > > > > > This is when we hit the second problem: using the skb or XDP context > > > directly is incorrect, because it assumes that the relevant protocol > > > headers are at the start of the buffer. In our use case, the correct > > > headers are at an offset since we're inspecting encapsulated packets. > > > > > > The best solution I've come up with is to steal 17 bits from the flags > > > argument to sk_lookup_*, 1 bit for BPF_F_HEADERS_AT_OFFSET, 16bit for > > > the offset itself. > > > > FYI there's also the upper 32 bits of the netns_id parameter, another > > option would be to steal 16 bits from there. > > Or len, which is only 16 bits realistically. The offset doesn't really fit into > either of them very well, using flags seemed the cleanest to me. > Is there some best practice around this? > > > > > > Thoughts? > > > > Internally with skbs, we use `skb_pull()` to manage header offsets, > > could we do something similar with `bpf_xdp_adjust_head()` prior to > > the call to `bpf_sk_lookup_*()`? > > That would only work if it retained the contents of the skipped > buffer, and if there > was a way to undo the adjustment later. We're doing the sk_lookup to > decide whether to > accept or forward the packet, so at the point of the call we might still need > that data. Is that feasible with skb / XDP ctx? While discussing the solution for reuseport I propose to use progs/test_select_reuseport_kern.c as an example of realistic program. It reads tcp/udp header directly via ctx->data or via bpf_skb_load_bytes() including payload after the header. It also uses bpf_skb_load_bytes_relative() to fetch IP. I think if we're fixing the sk_lookup from XDP the above program would need to work. And I think we can make it work by adding new requirement that 'struct bpf_sock_tuple *' argument to bpf_sk_lookup_* must be a pointer to the packet and not a pointer to bpf program stack. Then helper can construct a fake skb and assign fake_skb->data = &bpf_sock_tuple_arg.sport It can check that struct bpf_sock_tuple * pointer is within 100-ish bytes from xdp->data and within xdp->data_end This way the reuseport program's assumption that ctx->data points to tcp/udp will be preserved and it can access it all including payload. This approach doesn't need to mess with xdp_adjust_head and adjust uapi to pass length. Existing progs/test_sk_lookup_kern.c will magically start working with XDP even when reuseport prog is attached. Thoughts?