On Fri, Mar 22, 2024 at 2:34 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > > On 3/22/24 1:16 PM, Andrii Nakryiko wrote: > > On Fri, Mar 22, 2024 at 12:18 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > >> > >> On 3/22/24 11:45 AM, Andrii Nakryiko wrote: > >>> On Tue, Mar 19, 2024 at 10:54 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > >>>> Add bpf_link support for sk_msg and sk_skb programs. We have an > >>>> internal request to support bpf_link for sk_msg programs so user > >>>> space can have a uniform handling with bpf_link based libbpf > >>>> APIs. Using bpf_link based libbpf API also has a benefit which > >>>> makes system robust by decoupling prog life cycle and > >>>> attachment life cycle. > >>>> > >>>> Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx> > >>>> --- > >>>> include/linux/bpf.h | 13 +++ > >>>> include/uapi/linux/bpf.h | 10 ++ > >>>> kernel/bpf/syscall.c | 4 + > >>>> net/core/skmsg.c | 164 +++++++++++++++++++++++++++++++++ > >>>> net/core/sock_map.c | 6 +- > >>>> tools/include/uapi/linux/bpf.h | 10 ++ > >>>> 6 files changed, 203 insertions(+), 4 deletions(-) > >>>> > >>> [...] > >>> > >>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > >>>> index 3c42b9f1bada..c5506cfca4f8 100644 > >>>> --- a/include/uapi/linux/bpf.h > >>>> +++ b/include/uapi/linux/bpf.h > >>>> @@ -1135,6 +1135,8 @@ enum bpf_link_type { > >>>> BPF_LINK_TYPE_TCX = 11, > >>>> BPF_LINK_TYPE_UPROBE_MULTI = 12, > >>>> BPF_LINK_TYPE_NETKIT = 13, > >>>> + BPF_LINK_TYPE_SK_MSG = 14, > >>>> + BPF_LINK_TYPE_SK_SKB = 15, > >>> they are both "sockmap attachments", so maybe we should just have > >>> something like BPF_LINK_TYPE_SOCKMAP ? > >> Yes, we could do this. Basically it represents all programs > >> which can be attached to sockmap. > >> > >>>> __MAX_BPF_LINK_TYPE, > >>>> }; > >>>> > >>>> @@ -6718,6 +6720,14 @@ struct bpf_link_info { > >>>> __u32 ifindex; > >>>> __u32 attach_type; > >>>> } netkit; > >>>> + struct { > >>>> + __u32 map_id; > >>>> + __u32 attach_type; > >>>> + } skmsg; > >>>> + struct { > >>>> + __u32 map_id; > >>>> + __u32 attach_type; > >>>> + } skskb; > >>> and then this would be also just one struct, instead of two identical > >>> ones duplicated > >> Right, we could do one with name 'sockmap'. > >> > >>>> }; > >>>> } __attribute__((aligned(8))); > >>>> > >>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > >>>> index ae2ff73bde7e..3d13eec5a30d 100644 > >>>> --- a/kernel/bpf/syscall.c > >>>> +++ b/kernel/bpf/syscall.c > >>>> @@ -5213,6 +5213,10 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr) > >>>> case BPF_PROG_TYPE_SK_LOOKUP: > >>>> ret = netns_bpf_link_create(attr, prog); > >>>> break; > >>>> + case BPF_PROG_TYPE_SK_MSG: > >>>> + case BPF_PROG_TYPE_SK_SKB: > >>>> + ret = bpf_sk_msg_skb_link_create(attr, prog); > >>>> + break; > >>>> #ifdef CONFIG_NET > >>>> case BPF_PROG_TYPE_XDP: > >>>> ret = bpf_xdp_link_attach(attr, prog); > >>>> diff --git a/net/core/skmsg.c b/net/core/skmsg.c > >>>> index 4d75ef9d24bf..1aa900ad54d7 100644 > >>>> --- a/net/core/skmsg.c > >>>> +++ b/net/core/skmsg.c > >>>> @@ -1256,3 +1256,167 @@ void sk_psock_stop_verdict(struct sock *sk, struct sk_psock *psock) > >>>> sk->sk_data_ready = psock->saved_data_ready; > >>>> psock->saved_data_ready = NULL; > >>>> } > >>>> + > >>>> +struct bpf_sk_msg_skb_link { > >>>> + struct bpf_link link; > >>>> + struct bpf_map *map; > >>>> + enum bpf_attach_type attach_type; > >>>> +}; > >>>> + > >>>> +static DEFINE_MUTEX(link_mutex); > >>> maybe more specific name, sockmap_link_mutex? link_mutex sounds very generic > >> Good idea. > >> > >>>> + > >>>> +static struct bpf_sk_msg_skb_link *bpf_sk_msg_skb_link(const struct bpf_link *link) > >>>> +{ > >>>> + return container_of(link, struct bpf_sk_msg_skb_link, link); > >>>> +} > >>>> + > >>> [...] > >>> > >>>> + attach_type = attr->link_create.attach_type; > >>>> + bpf_link_init(&sk_link->link, link_type, &bpf_sk_msg_skb_link_ops, prog); > >>>> + sk_link->map = map; > >>>> + sk_link->attach_type = attach_type; > >>>> + > >>>> + ret = bpf_link_prime(&sk_link->link, &link_primer); > >>>> + if (ret) { > >>>> + kfree(sk_link); > >>>> + goto out; > >>>> + } > >>>> + > >>>> + ret = sock_map_prog_update(map, prog, NULL, attach_type); > >>> Does anything prevent someone else do to remove this program from > >>> user-space, bypassing the link? It's a guarantee of a link that > >>> attachment won't be tampered with (except for SYS_ADMIN-only > >>> force-detachment, which is a completely separate thing). > >>> > >>> It feels like there should be some sort of protection for programs > >>> attached through sockmap link here. Just like we have this for XDP, > >>> for example, or any of cgroup BPF programs attached through BPF link. > >> Good point. I have a 'bpf_prog_inc(prog)' below, I could do a refcount increase > >> before sock_map_prog_update(), we then should be okay. > >> > > My point was that once you attach a program to sockmap with > > LINK_CREATE, someone else just doing bpf_prog_attac() shouldn't > > replace this program anymore. BPF link preserves *attachment > > lifetime*, not just the program lifetime. > > I see. I briefly looked at xdp and cgroup where both support > link-based attachment and program base attachment. > For xdp, indeed if link attachment is already done, > prog attach is not allowed. > cgroup seems more complicated and need further study to > confirm whether BPF link preserves attachment in all > cases or not. it does, I implemented it, it was an explicit design decision > But any way, since bpf_link is a new feature for > sockmap. It makes sense to follow the xdp-link > style to disallow bpf_prog_attach once link is > created. > > > > >>>> + if (ret) { > >>>> + bpf_link_cleanup(&link_primer); > >>>> + goto out; > >>>> + } > >>>> + > >>>> + bpf_prog_inc(prog); > >>>> + > >>>> + return bpf_link_settle(&link_primer); > >>>> + > >>>> +out: > >>>> + bpf_map_put_with_uref(map); > >>>> + return ret; > >>>> +} > >>> [...]