On 11/27/24 11:07 AM, Maciej Fijalkowski wrote:
But kfunc does not work on PTR_TO_CTX - it takes in directly sk_buff, not __sk_buff. As I mention above we use bpf_cast_to_kern_ctx() and per my current limited understanding it overwrites the reg->type to PTR_TO_BTF_ID | PTR_TRUSTED.
Can you try skip calling the bpf_cast_to_kern_ctx and directly pass the "struct __sk_buff *skb" to the "struct sk_buff *bpf_skb_acquire(struct __sk_buff *skb).
I tried to simplify the use case that customer has, but I am a bit worried that it might only confuse people more :/ however, here it is:
No. not at all. I suspect the use case has some similarity to the net-timestamp patches (https://lore.kernel.org/bpf/20241028110535.82999-1-kerneljasonxing@xxxxxxxxx/) which uses a skb tskey to associate/co-relate different timestamp.
Please share the patch and the test case. It will be easier for others to help.
On TC egress hook skb is stored in a map - reason for picking it over the linked list or rbtree is that we want to be able to access skbs via some index, say a hash. This is where we bump the skb's refcount via acquire kfunc. During TC ingress hook on the same interface, the skb that was previously stored in map is retrieved, current skb that resides in the context of hook carries the timestamp via metadata. We then use the retrieved skb and tstamp from metadata on skb_tstamp_tx() (another kfunc) and finally decrement skb's refcount via release kfunc. Anyways, since we are able to do similar operations on task_struct (holding it in map via kptr), I don't see a reason why wouldn't we allow ourselves to do it on sk_buffs, no?
skb holds other things like dev and dst, like someone may be trying to remove the netdevice and route...etc. Overall, yes, the skb refcnt will eventually be decremented when the map is freed like other kptr (e.g. task) do.