Re: [External] Storing sk_buffs as kptrs in map

Martin KaFai Lau <martin.lau@xxxxxxxxx> · Wed, 27 Nov 2024 12:54:41 -0800

On 11/27/24 11:07 AM, Maciej Fijalkowski wrote:
But kfunc does not work on PTR_TO_CTX - it takes in directly sk_buff, not
__sk_buff. As I mention above we use bpf_cast_to_kern_ctx() and per my
current limited understanding it overwrites the reg->type to
PTR_TO_BTF_ID | PTR_TRUSTED.

Can you try skip calling the bpf_cast_to_kern_ctx and directly pass the "struct 
__sk_buff *skb" to the "struct sk_buff *bpf_skb_acquire(struct __sk_buff *skb).

I tried to simplify the use case that customer has, but I am a bit worried
that it might only confuse people more :/ however, here it is:

No. not at all. I suspect the use case has some similarity to the net-timestamp 
patches 
(https://lore.kernel.org/bpf/20241028110535.82999-1-kerneljasonxing@xxxxxxxxx/) 
which uses a skb tskey to associate/co-relate different timestamp.

Please share the patch and the test case. It will be easier for others to help.

On TC egress hook skb is stored in a map - reason for picking it over the
linked list or rbtree is that we want to be able to access skbs via some index,
say a hash. This is where we bump the skb's refcount via acquire kfunc.

During TC ingress hook on the same interface, the skb that was previously
stored in map is retrieved, current skb that resides in the context of
hook carries the timestamp via metadata. We then use the retrieved skb and
tstamp from metadata on skb_tstamp_tx() (another kfunc) and finally
decrement skb's refcount via release kfunc.

Anyways, since we are able to do similar operations on task_struct
(holding it in map via kptr), I don't see a reason why wouldn't we allow
ourselves to do it on sk_buffs, no?

skb holds other things like dev and dst, like someone may be trying to remove 
the netdevice and route...etc. Overall, yes, the skb refcnt will eventually be 
decremented when the map is freed like other kptr (e.g. task) do.