On 10/26/23 10:47 AM, Kui-Feng Lee wrote:
On 10/25/23 23:20, Daniel Borkmann wrote:
Hi Kui-Feng,
On 10/26/23 3:18 AM, Kui-Feng Lee wrote:
On 10/25/23 18:15, Kui-Feng Lee wrote:
On 10/25/23 15:09, Martin KaFai Lau wrote:
On 10/25/23 2:24 PM, Kui-Feng Lee wrote:
On 10/24/23 14:48, Daniel Borkmann wrote:
This work adds a new, minimal BPF-programmable device called "netkit"
(former PoC code-name "meta") we recently presented at LSF/MM/BPF. The
core idea is that BPF programs are executed within the drivers xmit routine
and therefore e.g. in case of containers/Pods moving BPF processing closer
to the source.
Sorry for intruding into this discussion! Although it is too late to
mentioned this since this patchset have been v4 already.
I notice netkit has introduced a new attach type. I wonder if it
possible to implement it as a new struct_ops type.
Could your elaborate more about what does this struct_ops type do and how
is it different from the SCHED_CLS bpf prog that the netkit is running?
I found the code has been landed.
Basing on the landed code and
the patchset of registering bpf struct_ops from modules that I
am working on, it will looks like what is done in following patch.
No changes on syscall, uapi and libbpf are required.
diff --git a/drivers/net/netkit.c b/drivers/net/netkit.c
index 7e484f9fd3ae..e4eafaf397bf 100644
--- a/drivers/net/netkit.c
+++ b/drivers/net/netkit.c
@@ -20,6 +20,7 @@ struct netkit {
struct bpf_mprog_entry __rcu *active;
enum netkit_action policy;
struct bpf_mprog_bundle bundle;
+ struct hlist_head ops_list;
/* Needed in slow-path */
enum netkit_mode mode;
@@ -27,6 +28,13 @@ struct netkit {
u32 headroom;
};
+struct netkit_ops {
+ struct hlist_node node;
+ int ifindex;
+
+ int (*xmit)(struct sk_buff *skb);
+};
+
struct netkit_link {
struct bpf_link link;
struct net_device *dev;
@@ -46,6 +54,22 @@ netkit_run(const struct bpf_mprog_entry *entry, struct
sk_buff *skb,
if (ret != NETKIT_NEXT)
break;
}
+
+ return ret;
+}
+
+static __always_inline int
+netkit_run_st_ops(const struct netkit *nk, struct sk_buff *skb,
+ enum netkit_action ret)
+{
+ struct netkit_ops *ops;
+
+ hlist_for_each_entry_rcu(ops, &nk->ops_list, node) {
+ ret = ops->xmit(skb);
+ if (ret != NETKIT_NEXT)
+ break;
+ }
+
return ret;
}
@@ -80,6 +104,8 @@ static netdev_tx_t netkit_xmit(struct sk_buff *skb,
struct net_device *dev)
entry = rcu_dereference(nk->active);
if (entry)
ret = netkit_run(entry, skb, ret);
+ if (ret == NETKIT_NEXT)
+ ret = netkit_run_st_ops(nk, skb, ret);
switch (ret) {
case NETKIT_NEXT:
case NETKIT_PASS:
I don't think it makes sense to cramp struct ops in here for what has been
solved already with the bpf_mprog interface in a more efficient way and with
control dependencies for the insertion (before/after relative programs/links).
The latter is in particular crucial for a multi-user interface when dealing
with network traffic (think for example: policy, forwarder, observability
prog, etc).
I don't mean to cramp two implementations together
and don't notice this patchset is already landed at beginning.
There are a few ways to track this. patchwork bot will send a landing message to
the list. There is a few mins lag time but I don't think this lags matter here.
You may want to check your inbox and ensure it gets through.
git always has the source of true also.
This patch is just for explanation of how it likes if it is implemented
with just struct_ops (without bpf_mprog).
Thanks for sharing a struct_ops code snippet. It is an interesting idea to embed
ifindex and other details in the struct.
Leaving it still needs verifier changes to make the PTR_TO_BTF_ID skb in
struct_ops to work like tc __sk_buff such that all existing tc-bpf prog will
work as is. Daniel has already mentioned the ordering API (bpf_mprog) that has
been discussed for a year and has already been used in tc-link which I hope it
will be extended to solve the xdp ordering also. I am also not convinced saving
two attach types (note the prog type is the same here) deserve to re-create
something in-parallel to tc-link and then require the same "skb" bpf dataplane
program to be administrated (attach/introspect...etc) differently.