On Thu, 2 Jul 2020 at 10:24, Jakub Sitnicki <jakub@xxxxxxxxxxxxxx> wrote: > > Overview > ======== > > (Same as in v2. Please skip to next section if you've read it.) > > This series proposes a new BPF program type named BPF_PROG_TYPE_SK_LOOKUP, > or BPF sk_lookup for short. > > BPF sk_lookup program runs when transport layer is looking up a listening > socket for a new connection request (TCP), or when looking up an > unconnected socket for a packet (UDP). > > This serves as a mechanism to overcome the limits of what bind() API allows > to express. Two use-cases driving this work are: > > (1) steer packets destined to an IP range, fixed port to a single socket > > 192.0.2.0/24, port 80 -> NGINX socket > > (2) steer packets destined to an IP address, any port to a single socket > > 198.51.100.1, any port -> L7 proxy socket > > In its context, program receives information about the packet that > triggered the socket lookup. Namely IP version, L4 protocol identifier, and > address 4-tuple. > > To select a socket BPF program fetches it from a map holding socket > references, like SOCKMAP or SOCKHASH, calls bpf_sk_assign(ctx, sk, ...) > helper to record the selection, and returns BPF_REDIRECT code. Transport > layer then uses the selected socket as a result of socket lookup. > > Alternatively, program can also fail the lookup (BPF_DROP), or let the > lookup continue as usual (BPF_OK). > > This lets the user match packets with listening (TCP) or receiving (UDP) > sockets freely at the last possible point on the receive path, where we > know that packets are destined for local delivery after undergoing > policing, filtering, and routing. > > Program is attached to a network namespace, similar to BPF flow_dissector. > We add a new attach type, BPF_SK_LOOKUP, for this. > > Series structure > ================ > > Patches are organized as so: > > 1: enabled multiple link-based prog attachments for bpf-netns > 2: introduces sk_lookup program type > 3-4: hook up the program to run on ipv4/tcp socket lookup > 5-6: hook up the program to run on ipv6/tcp socket lookup > 7-8: hook up the program to run on ipv4/udp socket lookup > 9-10: hook up the program to run on ipv6/udp socket lookup > 11-13: libbpf & bpftool support for sk_lookup > 14-16: verifier and selftests for sk_lookup > > Patches are also available on GH: > > https://github.com/jsitnicki/linux/commits/bpf-inet-lookup-v3 > > Performance considerations > ========================== > > I'm re-running udp6 small packet flood test, the scenario for which we had > performance concerns in [v2], to measure pps hit after the changes called > out in change log below. > > Will follow up with results. But I'm posting the patches early for review > since there is a fair amount of code changes. > > Further work > ============ > > - user docs for new prog type, Documentation/bpf/prog_sk_lookup.rst > I'm looking for consensus on multi-prog semantics outlined in patch #4 > description before drafting the document. > > - timeout on accept() in tests > I need to extract a helper for it into network_helpers in > selftests/bpf/. Didn't want to make this series any longer. > > Note to maintainers > =================== > > This patch series depends on bpf-netns multi-prog changes that went > recently into 'bpf' [0]. It won't apply onto 'bpf-next' until 'bpf' gets > merged into 'bpf-next'. > > Changelog > ========= > > v3 brings the following changes based on feedback: > > 1. switch to link-based program attachment, > 2. support for multi-prog attachment, > 3. ability to skip reuseport socket selection, > 4. code on RX path is guarded by a static key, > 5. struct in6_addr's are no longer copied into BPF prog context, > 6. BPF prog context is initialized as late as possible. > > v2 -> v3: > - Changes called out in patches 1-2, 4, 6, 8, 10-14, 16 > - Patches dropped: > 01/17 flow_dissector: Extract attach/detach/query helpers > 03/17 inet: Store layer 4 protocol in inet_hashinfo > 08/17 udp: Store layer 4 protocol in udp_table > > v1 -> v2: > - Changes called out in patches 2, 13-15, 17 > - Rebase to recent bpf-next (b4563facdcae) > > RFCv2 -> v1: > > - Switch to fetching a socket from a map and selecting a socket with > bpf_sk_assign, instead of having a dedicated helper that does both. > - Run reuseport logic on sockets selected by BPF sk_lookup. > - Allow BPF sk_lookup to fail the lookup with no match. > - Go back to having just 2 hash table lookups in UDP. > > RFCv1 -> RFCv2: > > - Make socket lookup redirection map-based. BPF program now uses a > dedicated helper and a SOCKARRAY map to select the socket to redirect to. > A consequence of this change is that bpf_inet_lookup context is now > read-only. > - Look for connected UDP sockets before allowing redirection from BPF. > This makes connected UDP socket work as expected in the presence of > inet_lookup prog. > - Share the code for BPF_PROG_{ATTACH,DETACH,QUERY} with flow_dissector, > the only other per-netns BPF prog type. > > [RFCv1] https://lore.kernel.org/bpf/20190618130050.8344-1-jakub@xxxxxxxxxxxxxx/ > [RFCv2] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@xxxxxxxxxxxxxx/ > [v1] https://lore.kernel.org/bpf/20200511185218.1422406-18-jakub@xxxxxxxxxxxxxx/ > [v2] https://lore.kernel.org/bpf/20200506125514.1020829-1-jakub@xxxxxxxxxxxxxx/ > [0] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=951f38cf08350884e72e0936adf147a8d764cc5d > > Cc: Alexei Starovoitov <ast@xxxxxxxxxx> > Cc: Andrii Nakryiko <andriin@xxxxxx> > Cc: Lorenz Bauer <lmb@xxxxxxxxxxxxxx> > Cc: Marek Majkowski <marek@xxxxxxxxxxxxxx> > Cc: Martin KaFai Lau <kafai@xxxxxx> > > Jakub Sitnicki (16): > bpf, netns: Handle multiple link attachments > bpf: Introduce SK_LOOKUP program type with a dedicated attach point > inet: Extract helper for selecting socket from reuseport group > inet: Run SK_LOOKUP BPF program on socket lookup > inet6: Extract helper for selecting socket from reuseport group > inet6: Run SK_LOOKUP BPF program on socket lookup > udp: Extract helper for selecting socket from reuseport group > udp: Run SK_LOOKUP BPF program on socket lookup > udp6: Extract helper for selecting socket from reuseport group > udp6: Run SK_LOOKUP BPF program on socket lookup > bpf: Sync linux/bpf.h to tools/ > libbpf: Add support for SK_LOOKUP program type > tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type > selftests/bpf: Add verifier tests for bpf_sk_lookup context access > selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c > selftests/bpf: Tests for BPF_SK_LOOKUP attach point For the series: Reviewed-by: Lorenz Bauer <lmb@xxxxxxxxxxxxxx> > > include/linux/bpf-netns.h | 3 + > include/linux/bpf.h | 33 + > include/linux/bpf_types.h | 2 + > include/linux/filter.h | 99 ++ > include/uapi/linux/bpf.h | 74 + > kernel/bpf/core.c | 22 + > kernel/bpf/net_namespace.c | 125 +- > kernel/bpf/syscall.c | 9 + > net/core/filter.c | 188 +++ > net/ipv4/inet_hashtables.c | 60 +- > net/ipv4/udp.c | 93 +- > net/ipv6/inet6_hashtables.c | 66 +- > net/ipv6/udp.c | 97 +- > scripts/bpf_helpers_doc.py | 9 +- > tools/bpf/bpftool/common.c | 1 + > tools/bpf/bpftool/prog.c | 3 +- > tools/include/uapi/linux/bpf.h | 74 + > tools/lib/bpf/libbpf.c | 3 + > tools/lib/bpf/libbpf.h | 2 + > tools/lib/bpf/libbpf.map | 2 + > tools/lib/bpf/libbpf_probes.c | 3 + > .../bpf/prog_tests/reference_tracking.c | 2 +- > .../selftests/bpf/prog_tests/sk_lookup.c | 1353 +++++++++++++++++ > .../selftests/bpf/progs/test_ref_track_kern.c | 181 +++ > .../selftests/bpf/progs/test_sk_lookup_kern.c | 462 ++++-- > .../selftests/bpf/verifier/ctx_sk_lookup.c | 219 +++ > 26 files changed, 2995 insertions(+), 190 deletions(-) > create mode 100644 tools/testing/selftests/bpf/prog_tests/sk_lookup.c > create mode 100644 tools/testing/selftests/bpf/progs/test_ref_track_kern.c > create mode 100644 tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c > > -- > 2.25.4 > -- Lorenz Bauer | Systems Engineer 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK www.cloudflare.com