On 3/22/24 11:45 AM, Andrii Nakryiko wrote:
On Tue, Mar 19, 2024 at 10:54 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
Add bpf_link support for sk_msg and sk_skb programs. We have an
internal request to support bpf_link for sk_msg programs so user
space can have a uniform handling with bpf_link based libbpf
APIs. Using bpf_link based libbpf API also has a benefit which
makes system robust by decoupling prog life cycle and
attachment life cycle.
Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx>
---
include/linux/bpf.h | 13 +++
include/uapi/linux/bpf.h | 10 ++
kernel/bpf/syscall.c | 4 +
net/core/skmsg.c | 164 +++++++++++++++++++++++++++++++++
net/core/sock_map.c | 6 +-
tools/include/uapi/linux/bpf.h | 10 ++
6 files changed, 203 insertions(+), 4 deletions(-)
[...]
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 3c42b9f1bada..c5506cfca4f8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1135,6 +1135,8 @@ enum bpf_link_type {
BPF_LINK_TYPE_TCX = 11,
BPF_LINK_TYPE_UPROBE_MULTI = 12,
BPF_LINK_TYPE_NETKIT = 13,
+ BPF_LINK_TYPE_SK_MSG = 14,
+ BPF_LINK_TYPE_SK_SKB = 15,
they are both "sockmap attachments", so maybe we should just have
something like BPF_LINK_TYPE_SOCKMAP ?
Yes, we could do this. Basically it represents all programs
which can be attached to sockmap.
__MAX_BPF_LINK_TYPE,
};
@@ -6718,6 +6720,14 @@ struct bpf_link_info {
__u32 ifindex;
__u32 attach_type;
} netkit;
+ struct {
+ __u32 map_id;
+ __u32 attach_type;
+ } skmsg;
+ struct {
+ __u32 map_id;
+ __u32 attach_type;
+ } skskb;
and then this would be also just one struct, instead of two identical
ones duplicated
Right, we could do one with name 'sockmap'.
};
} __attribute__((aligned(8)));
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ae2ff73bde7e..3d13eec5a30d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -5213,6 +5213,10 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
case BPF_PROG_TYPE_SK_LOOKUP:
ret = netns_bpf_link_create(attr, prog);
break;
+ case BPF_PROG_TYPE_SK_MSG:
+ case BPF_PROG_TYPE_SK_SKB:
+ ret = bpf_sk_msg_skb_link_create(attr, prog);
+ break;
#ifdef CONFIG_NET
case BPF_PROG_TYPE_XDP:
ret = bpf_xdp_link_attach(attr, prog);
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 4d75ef9d24bf..1aa900ad54d7 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -1256,3 +1256,167 @@ void sk_psock_stop_verdict(struct sock *sk, struct sk_psock *psock)
sk->sk_data_ready = psock->saved_data_ready;
psock->saved_data_ready = NULL;
}
+
+struct bpf_sk_msg_skb_link {
+ struct bpf_link link;
+ struct bpf_map *map;
+ enum bpf_attach_type attach_type;
+};
+
+static DEFINE_MUTEX(link_mutex);
maybe more specific name, sockmap_link_mutex? link_mutex sounds very generic
Good idea.
+
+static struct bpf_sk_msg_skb_link *bpf_sk_msg_skb_link(const struct bpf_link *link)
+{
+ return container_of(link, struct bpf_sk_msg_skb_link, link);
+}
+
[...]
+ attach_type = attr->link_create.attach_type;
+ bpf_link_init(&sk_link->link, link_type, &bpf_sk_msg_skb_link_ops, prog);
+ sk_link->map = map;
+ sk_link->attach_type = attach_type;
+
+ ret = bpf_link_prime(&sk_link->link, &link_primer);
+ if (ret) {
+ kfree(sk_link);
+ goto out;
+ }
+
+ ret = sock_map_prog_update(map, prog, NULL, attach_type);
Does anything prevent someone else do to remove this program from
user-space, bypassing the link? It's a guarantee of a link that
attachment won't be tampered with (except for SYS_ADMIN-only
force-detachment, which is a completely separate thing).
It feels like there should be some sort of protection for programs
attached through sockmap link here. Just like we have this for XDP,
for example, or any of cgroup BPF programs attached through BPF link.
Good point. I have a 'bpf_prog_inc(prog)' below, I could do a refcount increase
before sock_map_prog_update(), we then should be okay.
+ if (ret) {
+ bpf_link_cleanup(&link_primer);
+ goto out;
+ }
+
+ bpf_prog_inc(prog);
+
+ return bpf_link_settle(&link_primer);
+
+out:
+ bpf_map_put_with_uref(map);
+ return ret;
+}
[...]