Re: [PATCH bpf-next v2 07/11] bpf: tcp: Support tcp_congestion_ops in bpf

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 23 Dec 2019 15:20:10 -0800

On Fri, Dec 20, 2019 at 10:26 PM Martin KaFai Lau <kafai@xxxxxx> wrote:
>
> This patch makes "struct tcp_congestion_ops" to be the first user
> of BPF STRUCT_OPS.  It allows implementing a tcp_congestion_ops
> in bpf.
>
> The BPF implemented tcp_congestion_ops can be used like
> regular kernel tcp-cc through sysctl and setsockopt.  e.g.
> [root@arch-fb-vm1 bpf]# sysctl -a | egrep congestion
> net.ipv4.tcp_allowed_congestion_control = reno cubic bpf_cubic
> net.ipv4.tcp_available_congestion_control = reno bic cubic bpf_cubic
> net.ipv4.tcp_congestion_control = bpf_cubic
>
> There has been attempt to move the TCP CC to the user space
> (e.g. CCP in TCP).   The common arguments are faster turn around,
> get away from long-tail kernel versions in production...etc,
> which are legit points.
>
> BPF has been the continuous effort to join both kernel and
> userspace upsides together (e.g. XDP to gain the performance
> advantage without bypassing the kernel).  The recent BPF
> advancements (in particular BTF-aware verifier, BPF trampoline,
> BPF CO-RE...) made implementing kernel struct ops (e.g. tcp cc)
> possible in BPF.  It allows a faster turnaround for testing algorithm
> in the production while leveraging the existing (and continue growing)
> BPF feature/framework instead of building one specifically for
> userspace TCP CC.
>
> This patch allows write access to a few fields in tcp-sock
> (in bpf_tcp_ca_btf_struct_access()).
>
> The optional "get_info" is unsupported now.  It can be added
> later.  One possible way is to output the info with a btf-id
> to describe the content.
>
> Signed-off-by: Martin KaFai Lau <kafai@xxxxxx>
> ---
>  include/linux/filter.h            |   2 +
>  include/net/tcp.h                 |   1 +
>  kernel/bpf/bpf_struct_ops_types.h |   7 +-
>  net/core/filter.c                 |   2 +-
>  net/ipv4/Makefile                 |   4 +
>  net/ipv4/bpf_tcp_ca.c             | 226 ++++++++++++++++++++++++++++++
>  net/ipv4/tcp_cong.c               |  14 +-
>  net/ipv4/tcp_ipv4.c               |   6 +-
>  net/ipv4/tcp_minisocks.c          |   4 +-
>  net/ipv4/tcp_output.c             |   4 +-
>  10 files changed, 255 insertions(+), 15 deletions(-)
>  create mode 100644 net/ipv4/bpf_tcp_ca.c
>

Naming nits below. Other than that:

Acked-by: Andrii Nakryiko <andriin@xxxxxx>

[...]

> +static const struct btf_type *tcp_sock_type;
> +static u32 tcp_sock_id, sock_id;
> +
> +static int bpf_tcp_ca_init(struct btf *_btf_vmlinux)
> +{

there is no reason to pass anything but vmlinux's BTF to this
function, so I think just having "btf" as a name is OK.

> +       s32 type_id;
> +
> +       type_id = btf_find_by_name_kind(_btf_vmlinux, "sock", BTF_KIND_STRUCT);
> +       if (type_id < 0)
> +               return -EINVAL;
> +       sock_id = type_id;
> +
> +       type_id = btf_find_by_name_kind(_btf_vmlinux, "tcp_sock",
> +                                       BTF_KIND_STRUCT);
> +       if (type_id < 0)
> +               return -EINVAL;
> +       tcp_sock_id = type_id;
> +       tcp_sock_type = btf_type_by_id(_btf_vmlinux, tcp_sock_id);
> +
> +       return 0;
> +}
> +
> +static bool check_optional(u32 member_offset)

check_xxx is quite ambiguous, in general: is it a check that it is
optional or that it's not optional? How about using
is_optional/is_unsupported to make this clear?

> +{
> +       unsigned int i;
> +
> +       for (i = 0; i < ARRAY_SIZE(optional_ops); i++) {
> +               if (member_offset == optional_ops[i])
> +                       return true;
> +       }
> +
> +       return false;
> +}
> +

[...]