On Fri, Mar 22, 2024 at 12:18 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > > On 3/22/24 11:45 AM, Andrii Nakryiko wrote: > > On Tue, Mar 19, 2024 at 10:54 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > >> Add bpf_link support for sk_msg and sk_skb programs. We have an > >> internal request to support bpf_link for sk_msg programs so user > >> space can have a uniform handling with bpf_link based libbpf > >> APIs. Using bpf_link based libbpf API also has a benefit which > >> makes system robust by decoupling prog life cycle and > >> attachment life cycle. > >> > >> Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx> > >> --- > >> include/linux/bpf.h | 13 +++ > >> include/uapi/linux/bpf.h | 10 ++ > >> kernel/bpf/syscall.c | 4 + > >> net/core/skmsg.c | 164 +++++++++++++++++++++++++++++++++ > >> net/core/sock_map.c | 6 +- > >> tools/include/uapi/linux/bpf.h | 10 ++ > >> 6 files changed, 203 insertions(+), 4 deletions(-) > >> > > [...] > > > >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > >> index 3c42b9f1bada..c5506cfca4f8 100644 > >> --- a/include/uapi/linux/bpf.h > >> +++ b/include/uapi/linux/bpf.h > >> @@ -1135,6 +1135,8 @@ enum bpf_link_type { > >> BPF_LINK_TYPE_TCX = 11, > >> BPF_LINK_TYPE_UPROBE_MULTI = 12, > >> BPF_LINK_TYPE_NETKIT = 13, > >> + BPF_LINK_TYPE_SK_MSG = 14, > >> + BPF_LINK_TYPE_SK_SKB = 15, > > they are both "sockmap attachments", so maybe we should just have > > something like BPF_LINK_TYPE_SOCKMAP ? > > Yes, we could do this. Basically it represents all programs > which can be attached to sockmap. > > > > >> __MAX_BPF_LINK_TYPE, > >> }; > >> > >> @@ -6718,6 +6720,14 @@ struct bpf_link_info { > >> __u32 ifindex; > >> __u32 attach_type; > >> } netkit; > >> + struct { > >> + __u32 map_id; > >> + __u32 attach_type; > >> + } skmsg; > >> + struct { > >> + __u32 map_id; > >> + __u32 attach_type; > >> + } skskb; > > and then this would be also just one struct, instead of two identical > > ones duplicated > > Right, we could do one with name 'sockmap'. > > > > >> }; > >> } __attribute__((aligned(8))); > >> > >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > >> index ae2ff73bde7e..3d13eec5a30d 100644 > >> --- a/kernel/bpf/syscall.c > >> +++ b/kernel/bpf/syscall.c > >> @@ -5213,6 +5213,10 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr) > >> case BPF_PROG_TYPE_SK_LOOKUP: > >> ret = netns_bpf_link_create(attr, prog); > >> break; > >> + case BPF_PROG_TYPE_SK_MSG: > >> + case BPF_PROG_TYPE_SK_SKB: > >> + ret = bpf_sk_msg_skb_link_create(attr, prog); > >> + break; > >> #ifdef CONFIG_NET > >> case BPF_PROG_TYPE_XDP: > >> ret = bpf_xdp_link_attach(attr, prog); > >> diff --git a/net/core/skmsg.c b/net/core/skmsg.c > >> index 4d75ef9d24bf..1aa900ad54d7 100644 > >> --- a/net/core/skmsg.c > >> +++ b/net/core/skmsg.c > >> @@ -1256,3 +1256,167 @@ void sk_psock_stop_verdict(struct sock *sk, struct sk_psock *psock) > >> sk->sk_data_ready = psock->saved_data_ready; > >> psock->saved_data_ready = NULL; > >> } > >> + > >> +struct bpf_sk_msg_skb_link { > >> + struct bpf_link link; > >> + struct bpf_map *map; > >> + enum bpf_attach_type attach_type; > >> +}; > >> + > >> +static DEFINE_MUTEX(link_mutex); > > maybe more specific name, sockmap_link_mutex? link_mutex sounds very generic > > Good idea. > > > > >> + > >> +static struct bpf_sk_msg_skb_link *bpf_sk_msg_skb_link(const struct bpf_link *link) > >> +{ > >> + return container_of(link, struct bpf_sk_msg_skb_link, link); > >> +} > >> + > > [...] > > > >> + attach_type = attr->link_create.attach_type; > >> + bpf_link_init(&sk_link->link, link_type, &bpf_sk_msg_skb_link_ops, prog); > >> + sk_link->map = map; > >> + sk_link->attach_type = attach_type; > >> + > >> + ret = bpf_link_prime(&sk_link->link, &link_primer); > >> + if (ret) { > >> + kfree(sk_link); > >> + goto out; > >> + } > >> + > >> + ret = sock_map_prog_update(map, prog, NULL, attach_type); > > Does anything prevent someone else do to remove this program from > > user-space, bypassing the link? It's a guarantee of a link that > > attachment won't be tampered with (except for SYS_ADMIN-only > > force-detachment, which is a completely separate thing). > > > > It feels like there should be some sort of protection for programs > > attached through sockmap link here. Just like we have this for XDP, > > for example, or any of cgroup BPF programs attached through BPF link. > > Good point. I have a 'bpf_prog_inc(prog)' below, I could do a refcount increase > before sock_map_prog_update(), we then should be okay. > My point was that once you attach a program to sockmap with LINK_CREATE, someone else just doing bpf_prog_attac() shouldn't replace this program anymore. BPF link preserves *attachment lifetime*, not just the program lifetime. > > > >> + if (ret) { > >> + bpf_link_cleanup(&link_primer); > >> + goto out; > >> + } > >> + > >> + bpf_prog_inc(prog); > >> + > >> + return bpf_link_settle(&link_primer); > >> + > >> +out: > >> + bpf_map_put_with_uref(map); > >> + return ret; > >> +} > > [...]