Re: [PATCH bpf-next v2 1/6] bpf: Add bpf_link support for sk_msg and sk_skb progs

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 22 Mar 2024 14:35:13 -0700



On Fri, Mar 22, 2024 at 2:34 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
>
>
> On 3/22/24 1:16 PM, Andrii Nakryiko wrote:
> > On Fri, Mar 22, 2024 at 12:18 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
> >>
> >> On 3/22/24 11:45 AM, Andrii Nakryiko wrote:
> >>> On Tue, Mar 19, 2024 at 10:54 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
> >>>> Add bpf_link support for sk_msg and sk_skb programs. We have an
> >>>> internal request to support bpf_link for sk_msg programs so user
> >>>> space can have a uniform handling with bpf_link based libbpf
> >>>> APIs. Using bpf_link based libbpf API also has a benefit which
> >>>> makes system robust by decoupling prog life cycle and
> >>>> attachment life cycle.
> >>>>
> >>>> Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx>
> >>>> ---
> >>>>    include/linux/bpf.h            |  13 +++
> >>>>    include/uapi/linux/bpf.h       |  10 ++
> >>>>    kernel/bpf/syscall.c           |   4 +
> >>>>    net/core/skmsg.c               | 164 +++++++++++++++++++++++++++++++++
> >>>>    net/core/sock_map.c            |   6 +-
> >>>>    tools/include/uapi/linux/bpf.h |  10 ++
> >>>>    6 files changed, 203 insertions(+), 4 deletions(-)
> >>>>
> >>> [...]
> >>>
> >>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >>>> index 3c42b9f1bada..c5506cfca4f8 100644
> >>>> --- a/include/uapi/linux/bpf.h
> >>>> +++ b/include/uapi/linux/bpf.h
> >>>> @@ -1135,6 +1135,8 @@ enum bpf_link_type {
> >>>>           BPF_LINK_TYPE_TCX = 11,
> >>>>           BPF_LINK_TYPE_UPROBE_MULTI = 12,
> >>>>           BPF_LINK_TYPE_NETKIT = 13,
> >>>> +       BPF_LINK_TYPE_SK_MSG = 14,
> >>>> +       BPF_LINK_TYPE_SK_SKB = 15,
> >>> they are both "sockmap attachments", so maybe we should just have
> >>> something like BPF_LINK_TYPE_SOCKMAP ?
> >> Yes, we could do this. Basically it represents all programs
> >> which can be attached to sockmap.
> >>
> >>>>           __MAX_BPF_LINK_TYPE,
> >>>>    };
> >>>>
> >>>> @@ -6718,6 +6720,14 @@ struct bpf_link_info {
> >>>>                           __u32 ifindex;
> >>>>                           __u32 attach_type;
> >>>>                   } netkit;
> >>>> +               struct {
> >>>> +                       __u32 map_id;
> >>>> +                       __u32 attach_type;
> >>>> +               } skmsg;
> >>>> +               struct {
> >>>> +                       __u32 map_id;
> >>>> +                       __u32 attach_type;
> >>>> +               } skskb;
> >>> and then this would be also just one struct, instead of two identical
> >>> ones duplicated
> >> Right, we could do one with name 'sockmap'.
> >>
> >>>>           };
> >>>>    } __attribute__((aligned(8)));
> >>>>
> >>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> >>>> index ae2ff73bde7e..3d13eec5a30d 100644
> >>>> --- a/kernel/bpf/syscall.c
> >>>> +++ b/kernel/bpf/syscall.c
> >>>> @@ -5213,6 +5213,10 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
> >>>>           case BPF_PROG_TYPE_SK_LOOKUP:
> >>>>                   ret = netns_bpf_link_create(attr, prog);
> >>>>                   break;
> >>>> +       case BPF_PROG_TYPE_SK_MSG:
> >>>> +       case BPF_PROG_TYPE_SK_SKB:
> >>>> +               ret = bpf_sk_msg_skb_link_create(attr, prog);
> >>>> +               break;
> >>>>    #ifdef CONFIG_NET
> >>>>           case BPF_PROG_TYPE_XDP:
> >>>>                   ret = bpf_xdp_link_attach(attr, prog);
> >>>> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> >>>> index 4d75ef9d24bf..1aa900ad54d7 100644
> >>>> --- a/net/core/skmsg.c
> >>>> +++ b/net/core/skmsg.c
> >>>> @@ -1256,3 +1256,167 @@ void sk_psock_stop_verdict(struct sock *sk, struct sk_psock *psock)
> >>>>           sk->sk_data_ready = psock->saved_data_ready;
> >>>>           psock->saved_data_ready = NULL;
> >>>>    }
> >>>> +
> >>>> +struct bpf_sk_msg_skb_link {
> >>>> +       struct bpf_link link;
> >>>> +       struct bpf_map *map;
> >>>> +       enum bpf_attach_type attach_type;
> >>>> +};
> >>>> +
> >>>> +static DEFINE_MUTEX(link_mutex);
> >>> maybe more specific name, sockmap_link_mutex? link_mutex sounds very generic
> >> Good idea.
> >>
> >>>> +
> >>>> +static struct bpf_sk_msg_skb_link *bpf_sk_msg_skb_link(const struct bpf_link *link)
> >>>> +{
> >>>> +       return container_of(link, struct bpf_sk_msg_skb_link, link);
> >>>> +}
> >>>> +
> >>> [...]
> >>>
> >>>> +       attach_type = attr->link_create.attach_type;
> >>>> +       bpf_link_init(&sk_link->link, link_type, &bpf_sk_msg_skb_link_ops, prog);
> >>>> +       sk_link->map = map;
> >>>> +       sk_link->attach_type = attach_type;
> >>>> +
> >>>> +       ret = bpf_link_prime(&sk_link->link, &link_primer);
> >>>> +       if (ret) {
> >>>> +               kfree(sk_link);
> >>>> +               goto out;
> >>>> +       }
> >>>> +
> >>>> +       ret = sock_map_prog_update(map, prog, NULL, attach_type);
> >>> Does anything prevent someone else do to remove this program from
> >>> user-space, bypassing the link? It's a guarantee of a link that
> >>> attachment won't be tampered with (except for SYS_ADMIN-only
> >>> force-detachment, which is a completely separate thing).
> >>>
> >>> It feels like there should be some sort of protection for programs
> >>> attached through sockmap link here. Just like we have this for XDP,
> >>> for example, or any of cgroup BPF programs attached through BPF link.
> >> Good point. I have a 'bpf_prog_inc(prog)' below, I could do a refcount increase
> >> before sock_map_prog_update(), we then should be okay.
> >>
> > My point was that once you attach a program to sockmap with
> > LINK_CREATE, someone else just doing bpf_prog_attac() shouldn't
> > replace this program anymore. BPF link preserves *attachment
> > lifetime*, not just the program lifetime.
>
> I see. I briefly looked at xdp and cgroup where both support
> link-based attachment and program base attachment.
> For xdp, indeed if link attachment is already done,
> prog attach is not allowed.
> cgroup seems more complicated and need further study to
> confirm whether BPF link preserves attachment in all
> cases or not.

it does, I implemented it, it was an explicit design decision


> But any way, since bpf_link is a new feature for
> sockmap. It makes sense to follow the xdp-link
> style to disallow bpf_prog_attach once link is
> created.
>
> >
> >>>> +       if (ret) {
> >>>> +               bpf_link_cleanup(&link_primer);
> >>>> +               goto out;
> >>>> +       }
> >>>> +
> >>>> +       bpf_prog_inc(prog);
> >>>> +
> >>>> +       return bpf_link_settle(&link_primer);
> >>>> +
> >>>> +out:
> >>>> +       bpf_map_put_with_uref(map);
> >>>> +       return ret;
> >>>> +}
> >>> [...]