The use case is when user wants to use sk_buff pointers as kptrs against program types that take in __sk_buff struct as context. A pair of kfuncs for acquiring and releasing skb would look as follows: __bpf_kfunc struct sk_buff *bpf_skb_acquire(struct sk_buff *skb__ctx_ref) __bpf_kfunc void bpf_skb_release(struct sk_buff *skb__ctx_ref) Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> --- kernel/bpf/verifier.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 60938b136365..b16a39d28f8a 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -11303,6 +11303,11 @@ static bool is_kfunc_arg_const_str(const struct btf *btf, const struct btf_param return btf_param_match_suffix(btf, arg, "__str"); } +static bool is_kfunc_arg_ctx_ref(const struct btf *btf, const struct btf_param *arg) +{ + return btf_param_match_suffix(btf, arg, "__ctx_ref"); +} + static bool is_kfunc_arg_scalar_with_name(const struct btf *btf, const struct btf_param *arg, const char *name) @@ -11599,6 +11604,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env, if (meta->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx]) return KF_ARG_PTR_TO_CTX; + if (is_kfunc_arg_ctx_ref(meta->btf, &args[argno])) + return KF_ARG_PTR_TO_BTF_ID; + /* In this function, we verify the kfunc's BTF as per the argument type, * leaving the rest of the verification with respect to the register * type to our caller. When a set of conditions hold in the BTF type of -- 2.34.1 Then below example progs store the skb on TC tx and retrieve it later on when TC rx hook is called. This is a ping-pong of packets between veth pair interfaces. We start with Tx on interface with progs attached and then receive the packets back. I have stripped the other logic so that we focus only on kptr storage: -------------------------------8<------------------------------- struct sk_buff *bpf_skb_acquire(struct sk_buff *skb) __ksym; void bpf_skb_release(struct sk_buff *skb) __ksym; void *bpf_cast_to_kern_ctx(void *) __ksym; u32 key = 0; u32 get_key = 0; struct map_value { struct sk_buff __kptr * skb; }; struct { __uint(type, BPF_MAP_TYPE_ARRAY); __type(key, int); __type(value, struct map_value); __uint(max_entries, 16); } skb_map SEC(".maps"); SEC("tc") int bpf_tc_ingress(struct __sk_buff *skb) { struct map_value *map_entry; struct sk_buff *tmp = NULL; map_entry = bpf_map_lookup_elem(&skb_map, &get_key); if (!map_entry) { bpf_printk("no entry at get key %d\n", get_key); return TC_ACT_SHOT; } tmp = bpf_kptr_xchg(&map_entry->skb, tmp); if (!tmp) return TC_ACT_UNSPEC; bpf_printk("retrieved skb %p from index %d\n", tmp, get_key); get_key++; if (get_key >= 16) get_key = 0; bpf_skb_release(tmp); return TC_ACT_OK; } SEC("tc") int bpf_tc_egress(struct __sk_buff *ctx) { struct sk_buff *skbk = bpf_cast_to_kern_ctx(ctx); struct map_value *map_entry; struct map_value tmp_entry; struct sk_buff *tmp; tmp_entry.skb = NULL; bpf_map_update_elem(&skb_map, &key, &tmp_entry, BPF_ANY); map_entry = bpf_map_lookup_elem(&skb_map, &key); if (!map_entry) return TC_ACT_SHOT; skbk = bpf_skb_acquire(skbk); if (!skbk) return TC_ACT_SHOT; tmp = bpf_kptr_xchg(&map_entry->skb, skbk); if (tmp) bpf_skb_release(tmp); bpf_printk("stored skb %p at index %d\n", skbk, key); key++; if (key >= 16) key = 0; return TC_ACT_OK; } char LICENSE[] SEC("license") = "GPL"; ------------------------------->8------------------------------- > > > I tried to simplify the use case that customer has, but I am a bit worried > > that it might only confuse people more :/ however, here it is: > > No. not at all. I suspect the use case has some similarity to the > net-timestamp patches (https://lore.kernel.org/bpf/20241028110535.82999-1-kerneljasonxing@xxxxxxxxx/) > which uses a skb tskey to associate/co-relate different timestamp. I was aware of this work and I will give this a deep look, thanks. > > > On TC egress hook skb is stored in a map - reason for picking it over the > > linked list or rbtree is that we want to be able to access skbs via some index, > > say a hash. This is where we bump the skb's refcount via acquire kfunc. > > > > During TC ingress hook on the same interface, the skb that was previously > > stored in map is retrieved, current skb that resides in the context of > > hook carries the timestamp via metadata. We then use the retrieved skb and > > tstamp from metadata on skb_tstamp_tx() (another kfunc) and finally > > decrement skb's refcount via release kfunc. > > > > > > Anyways, since we are able to do similar operations on task_struct > > (holding it in map via kptr), I don't see a reason why wouldn't we allow > > ourselves to do it on sk_buffs, no? > > skb holds other things like dev and dst, like someone may be trying to > remove the netdevice and route...etc. Overall, yes, the skb refcnt will > eventually be decremented when the map is freed like other kptr (e.g. task) > do. >