Re: [PATCH bpf-next v2 1/6] bpf: Add bpf_link support for sk_msg and sk_skb progs

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 22 Mar 2024 13:16:33 -0700

On Fri, Mar 22, 2024 at 12:18 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
>
>
> On 3/22/24 11:45 AM, Andrii Nakryiko wrote:
> > On Tue, Mar 19, 2024 at 10:54 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
> >> Add bpf_link support for sk_msg and sk_skb programs. We have an
> >> internal request to support bpf_link for sk_msg programs so user
> >> space can have a uniform handling with bpf_link based libbpf
> >> APIs. Using bpf_link based libbpf API also has a benefit which
> >> makes system robust by decoupling prog life cycle and
> >> attachment life cycle.
> >>
> >> Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx>
> >> ---
> >>   include/linux/bpf.h            |  13 +++
> >>   include/uapi/linux/bpf.h       |  10 ++
> >>   kernel/bpf/syscall.c           |   4 +
> >>   net/core/skmsg.c               | 164 +++++++++++++++++++++++++++++++++
> >>   net/core/sock_map.c            |   6 +-
> >>   tools/include/uapi/linux/bpf.h |  10 ++
> >>   6 files changed, 203 insertions(+), 4 deletions(-)
> >>
> > [...]
> >
> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >> index 3c42b9f1bada..c5506cfca4f8 100644
> >> --- a/include/uapi/linux/bpf.h
> >> +++ b/include/uapi/linux/bpf.h
> >> @@ -1135,6 +1135,8 @@ enum bpf_link_type {
> >>          BPF_LINK_TYPE_TCX = 11,
> >>          BPF_LINK_TYPE_UPROBE_MULTI = 12,
> >>          BPF_LINK_TYPE_NETKIT = 13,
> >> +       BPF_LINK_TYPE_SK_MSG = 14,
> >> +       BPF_LINK_TYPE_SK_SKB = 15,
> > they are both "sockmap attachments", so maybe we should just have
> > something like BPF_LINK_TYPE_SOCKMAP ?
>
> Yes, we could do this. Basically it represents all programs
> which can be attached to sockmap.
>
> >
> >>          __MAX_BPF_LINK_TYPE,
> >>   };
> >>
> >> @@ -6718,6 +6720,14 @@ struct bpf_link_info {
> >>                          __u32 ifindex;
> >>                          __u32 attach_type;
> >>                  } netkit;
> >> +               struct {
> >> +                       __u32 map_id;
> >> +                       __u32 attach_type;
> >> +               } skmsg;
> >> +               struct {
> >> +                       __u32 map_id;
> >> +                       __u32 attach_type;
> >> +               } skskb;
> > and then this would be also just one struct, instead of two identical
> > ones duplicated
>
> Right, we could do one with name 'sockmap'.
>
> >
> >>          };
> >>   } __attribute__((aligned(8)));
> >>
> >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> >> index ae2ff73bde7e..3d13eec5a30d 100644
> >> --- a/kernel/bpf/syscall.c
> >> +++ b/kernel/bpf/syscall.c
> >> @@ -5213,6 +5213,10 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
> >>          case BPF_PROG_TYPE_SK_LOOKUP:
> >>                  ret = netns_bpf_link_create(attr, prog);
> >>                  break;
> >> +       case BPF_PROG_TYPE_SK_MSG:
> >> +       case BPF_PROG_TYPE_SK_SKB:
> >> +               ret = bpf_sk_msg_skb_link_create(attr, prog);
> >> +               break;
> >>   #ifdef CONFIG_NET
> >>          case BPF_PROG_TYPE_XDP:
> >>                  ret = bpf_xdp_link_attach(attr, prog);
> >> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> >> index 4d75ef9d24bf..1aa900ad54d7 100644
> >> --- a/net/core/skmsg.c
> >> +++ b/net/core/skmsg.c
> >> @@ -1256,3 +1256,167 @@ void sk_psock_stop_verdict(struct sock *sk, struct sk_psock *psock)
> >>          sk->sk_data_ready = psock->saved_data_ready;
> >>          psock->saved_data_ready = NULL;
> >>   }
> >> +
> >> +struct bpf_sk_msg_skb_link {
> >> +       struct bpf_link link;
> >> +       struct bpf_map *map;
> >> +       enum bpf_attach_type attach_type;
> >> +};
> >> +
> >> +static DEFINE_MUTEX(link_mutex);
> > maybe more specific name, sockmap_link_mutex? link_mutex sounds very generic
>
> Good idea.
>
> >
> >> +
> >> +static struct bpf_sk_msg_skb_link *bpf_sk_msg_skb_link(const struct bpf_link *link)
> >> +{
> >> +       return container_of(link, struct bpf_sk_msg_skb_link, link);
> >> +}
> >> +
> > [...]
> >
> >> +       attach_type = attr->link_create.attach_type;
> >> +       bpf_link_init(&sk_link->link, link_type, &bpf_sk_msg_skb_link_ops, prog);
> >> +       sk_link->map = map;
> >> +       sk_link->attach_type = attach_type;
> >> +
> >> +       ret = bpf_link_prime(&sk_link->link, &link_primer);
> >> +       if (ret) {
> >> +               kfree(sk_link);
> >> +               goto out;
> >> +       }
> >> +
> >> +       ret = sock_map_prog_update(map, prog, NULL, attach_type);
> > Does anything prevent someone else do to remove this program from
> > user-space, bypassing the link? It's a guarantee of a link that
> > attachment won't be tampered with (except for SYS_ADMIN-only
> > force-detachment, which is a completely separate thing).
> >
> > It feels like there should be some sort of protection for programs
> > attached through sockmap link here. Just like we have this for XDP,
> > for example, or any of cgroup BPF programs attached through BPF link.
>
> Good point. I have a 'bpf_prog_inc(prog)' below, I could do a refcount increase
> before sock_map_prog_update(), we then should be okay.
>

My point was that once you attach a program to sockmap with
LINK_CREATE, someone else just doing bpf_prog_attac() shouldn't
replace this program anymore. BPF link preserves *attachment
lifetime*, not just the program lifetime.

> >
> >> +       if (ret) {
> >> +               bpf_link_cleanup(&link_primer);
> >> +               goto out;
> >> +       }
> >> +
> >> +       bpf_prog_inc(prog);
> >> +
> >> +       return bpf_link_settle(&link_primer);
> >> +
> >> +out:
> >> +       bpf_map_put_with_uref(map);
> >> +       return ret;
> >> +}
> > [...]