Re: [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 17 Jul 2020 at 11:35, Jakub Sitnicki <jakub@xxxxxxxxxxxxxx> wrote:
>
> Changelog
> =========
> v4 -> v5:
> - Enforce BPF prog return value to be SK_DROP or SK_PASS. (Andrii)
> - Simplify prog runners now that only SK_DROP/PASS can be returned.
> - Enable bpf_perf_event_output from the start. (Andrii)
> - Drop patch
>   "selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c"
> - Remove tests for narrow loads from context at an offset wider in size
>   than target field, while we are discussing how to fix it:
>   https://lore.kernel.org/bpf/20200710173123.427983-1-jakub@xxxxxxxxxxxxxx/
> - Rebase onto recent bpf-next (bfdfa51702de)
> - Other minor changes called out in per-patch changelogs,
>   see patches: 2, 4, 6, 13-15
> - Carried over Andrii's Acks where nothing changed.
>
> v3 -> v4:
> - Reduce BPF prog return codes to SK_DROP/SK_PASS (Lorenz)
> - Default to drop on illegal return value from BPF prog (Lorenz)
> - Extend bpf_sk_assign to accept NULL socket pointer.
> - Switch to saner return values and add docs for new prog_array API (Andrii)
> - Add support for narrow loads from BPF context fields (Yonghong)
> - Fix broken build when IPv6 is compiled as a module (kernel test robot)
> - Fix null/wild-ptr-deref on BPF context access
> - Rebase to recent bpf-next (eef8a42d6ce0)
> - Other minor changes called out in per-patch changelogs,
>   see patches 1-2, 4, 6, 8, 10-12, 14, 16
>
> v2 -> v3:
> - Switch to link-based program attachment
> - Support for multi-prog attachment
> - Ability to skip reuseport socket selection
> - Code on RX path is guarded by a static key
> - struct in6_addr's are no longer copied into BPF prog context
> - BPF prog context is initialized as late as possible
> - Changes called out in patches 1-2, 4, 6, 8, 10-14, 16
> - Patches dropped:
>   01/17 flow_dissector: Extract attach/detach/query helpers
>   03/17 inet: Store layer 4 protocol in inet_hashinfo
>   08/17 udp: Store layer 4 protocol in udp_table
>
> v1 -> v2:
> - Changes called out in patches 2, 13-15, 17
> - Rebase to recent bpf-next (b4563facdcae)
>
> RFCv2 -> v1:
>
> - Switch to fetching a socket from a map and selecting a socket with
>   bpf_sk_assign, instead of having a dedicated helper that does both.
> - Run reuseport logic on sockets selected by BPF sk_lookup.
> - Allow BPF sk_lookup to fail the lookup with no match.
> - Go back to having just 2 hash table lookups in UDP.
>
> RFCv1 -> RFCv2:
>
> - Make socket lookup redirection map-based. BPF program now uses a
>   dedicated helper and a SOCKARRAY map to select the socket to redirect to.
>   A consequence of this change is that bpf_inet_lookup context is now
>   read-only.
> - Look for connected UDP sockets before allowing redirection from BPF.
>   This makes connected UDP socket work as expected in the presence of
>   inet_lookup prog.
> - Share the code for BPF_PROG_{ATTACH,DETACH,QUERY} with flow_dissector,
>   the only other per-netns BPF prog type.
>
> Overview
> ========
>
> This series proposes a new BPF program type named BPF_PROG_TYPE_SK_LOOKUP,
> or BPF sk_lookup for short.
>
> BPF sk_lookup program runs when transport layer is looking up a listening
> socket for a new connection request (TCP), or when looking up an
> unconnected socket for a packet (UDP).
>
> This serves as a mechanism to overcome the limits of what bind() API allows
> to express. Two use-cases driving this work are:
>
>  (1) steer packets destined to an IP range, fixed port to a single socket
>
>      192.0.2.0/24, port 80 -> NGINX socket
>
>  (2) steer packets destined to an IP address, any port to a single socket
>
>      198.51.100.1, any port -> L7 proxy socket
>
> In its context, program receives information about the packet that
> triggered the socket lookup. Namely IP version, L4 protocol identifier, and
> address 4-tuple.
>
> To select a socket BPF program fetches it from a map holding socket
> references, like SOCKMAP or SOCKHASH, calls bpf_sk_assign(ctx, sk, ...)
> helper to record the selection, and returns SK_PASS code. Transport layer
> then uses the selected socket as a result of socket lookup.
>
> Alternatively, program can also fail the lookup (SK_DROP), or let the
> lookup continue as usual (SK_PASS without selecting a socket).
>
> This lets the user match packets with listening (TCP) or receiving (UDP)
> sockets freely at the last possible point on the receive path, where we
> know that packets are destined for local delivery after undergoing
> policing, filtering, and routing.
>
> Program is attached to a network namespace, similar to BPF flow_dissector.
> We add a new attach type, BPF_SK_LOOKUP, for this. Multiple programs can be
> attached at the same time, in which case their return values are aggregated
> according the rules outlined in patch #4 description.
>
> Series structure
> ================
>
> Patches are organized as so:
>
>  1: enables multiple link-based prog attachments for bpf-netns
>  2: introduces sk_lookup program type
>  3-4: hook up the program to run on ipv4/tcp socket lookup
>  5-6: hook up the program to run on ipv6/tcp socket lookup
>  7-8: hook up the program to run on ipv4/udp socket lookup
>  9-10: hook up the program to run on ipv6/udp socket lookup
>  11-13: libbpf & bpftool support for sk_lookup
>  14-15: verifier and selftests for sk_lookup
>
> Patches are also available on GH:
>
>   https://github.com/jsitnicki/linux/commits/bpf-inet-lookup-v5
>
> Follow-up work
> ==============
>
> I'll follow up with below items, which IMHO don't block the review:
>
> - benchmark results for udp6 small packet flood scenario,
> - user docs for new BPF prog type, Documentation/bpf/prog_sk_lookup.rst,
> - timeout for accept() in tests after extending network_helper.[ch].
>
> Thanks to the reviewers for their feedback to this patch series:
>
> Cc: Alexei Starovoitov <ast@xxxxxxxxxx>
> Cc: Andrii Nakryiko <andriin@xxxxxx>
> Cc: Lorenz Bauer <lmb@xxxxxxxxxxxxxx>
> Cc: Marek Majkowski <marek@xxxxxxxxxxxxxx>
> Cc: Martin KaFai Lau <kafai@xxxxxx>
> Cc: Yonghong Song <yhs@xxxxxx>
>
> -jkbs

Phew, I have to admit that at the patch that adds 2k lines of tests my
eyes glazed over a bit, but other than that: thank you for your hard
work!

For the series:
Reviewed-by: Lorenz Bauer <lmb@xxxxxxxxxxxxxx>

>
> [RFCv1] https://lore.kernel.org/bpf/20190618130050.8344-1-jakub@xxxxxxxxxxxxxx/
> [RFCv2] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@xxxxxxxxxxxxxx/
> [v1] https://lore.kernel.org/bpf/20200511185218.1422406-18-jakub@xxxxxxxxxxxxxx/
> [v2] https://lore.kernel.org/bpf/20200506125514.1020829-1-jakub@xxxxxxxxxxxxxx/
> [v3] https://lore.kernel.org/bpf/20200702092416.11961-1-jakub@xxxxxxxxxxxxxx/
> [v4] https://lore.kernel.org/bpf/20200713174654.642628-1-jakub@xxxxxxxxxxxxxx/
>
> Jakub Sitnicki (15):
>   bpf, netns: Handle multiple link attachments
>   bpf: Introduce SK_LOOKUP program type with a dedicated attach point
>   inet: Extract helper for selecting socket from reuseport group
>   inet: Run SK_LOOKUP BPF program on socket lookup
>   inet6: Extract helper for selecting socket from reuseport group
>   inet6: Run SK_LOOKUP BPF program on socket lookup
>   udp: Extract helper for selecting socket from reuseport group
>   udp: Run SK_LOOKUP BPF program on socket lookup
>   udp6: Extract helper for selecting socket from reuseport group
>   udp6: Run SK_LOOKUP BPF program on socket lookup
>   bpf: Sync linux/bpf.h to tools/
>   libbpf: Add support for SK_LOOKUP program type
>   tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type
>   selftests/bpf: Add verifier tests for bpf_sk_lookup context access
>   selftests/bpf: Tests for BPF_SK_LOOKUP attach point
>
>  include/linux/bpf-netns.h                     |    3 +
>  include/linux/bpf.h                           |    4 +
>  include/linux/bpf_types.h                     |    2 +
>  include/linux/filter.h                        |  147 ++
>  include/uapi/linux/bpf.h                      |   77 +
>  kernel/bpf/core.c                             |   55 +
>  kernel/bpf/net_namespace.c                    |  127 +-
>  kernel/bpf/syscall.c                          |    9 +
>  kernel/bpf/verifier.c                         |   13 +-
>  net/core/filter.c                             |  183 +++
>  net/ipv4/inet_hashtables.c                    |   60 +-
>  net/ipv4/udp.c                                |   93 +-
>  net/ipv6/inet6_hashtables.c                   |   66 +-
>  net/ipv6/udp.c                                |   97 +-
>  scripts/bpf_helpers_doc.py                    |    9 +-
>  .../bpftool/Documentation/bpftool-prog.rst    |    2 +-
>  tools/bpf/bpftool/bash-completion/bpftool     |    2 +-
>  tools/bpf/bpftool/common.c                    |    1 +
>  tools/bpf/bpftool/prog.c                      |    3 +-
>  tools/include/uapi/linux/bpf.h                |   77 +
>  tools/lib/bpf/libbpf.c                        |    3 +
>  tools/lib/bpf/libbpf.h                        |    2 +
>  tools/lib/bpf/libbpf.map                      |    2 +
>  tools/lib/bpf/libbpf_probes.c                 |    3 +
>  tools/testing/selftests/bpf/network_helpers.c |   58 +-
>  tools/testing/selftests/bpf/network_helpers.h |    2 +
>  .../selftests/bpf/prog_tests/sk_lookup.c      | 1282 +++++++++++++++++
>  .../selftests/bpf/progs/test_sk_lookup.c      |  641 +++++++++
>  .../selftests/bpf/verifier/ctx_sk_lookup.c    |  492 +++++++
>  29 files changed, 3418 insertions(+), 97 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/sk_lookup.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_sk_lookup.c
>  create mode 100644 tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c
>
> --
> 2.25.4
>


-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux