Re: [External] Storing sk_buffs as kptrs in map

Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> · Wed, 27 Nov 2024 20:07:47 +0100

On Tue, Nov 26, 2024 at 12:47:09PM -0800, Martin KaFai Lau wrote:
> On 11/26/24 11:56 AM, Amery Hung wrote:
> > > I have a use case where I would like to store sk_buff pointers as kptrs in
> > > eBPF map. To do so, I am borrowing skb kfuncs for acquire/release/destroy
> > > from Amery Hung's bpf qdisc set [0], but they are registered for
> > > BPF_PROG_TYPE_SCHED_CLS programs.
> > > 
> > > TL;DR - due to following callstack:
> > > 
> > > do_check()
> > >    check_kfunc_call()
> > >      check_kfunc_args()
> > >        get_kfunc_ptr_arg_type()
> > >            btf_is_prog_ctx_type()
> > >                btf_is_projection_of() -- return true
> > > 
> > > sk_buff argument is being interpreted as KF_ARG_PTR_TO_CTX, but what we
> > > have there is KF_ARG_PTR_TO_BTF_ID. Verifier is unhappy about it. Should
> 
> I don't think I fully understand "what we have there is
> KF_ARG_PTR_TO_BTF_ID". I am trying to guess you meant what we have there in
> the reg->type is in (PTR_TO_BTF_ID | PTR_TRUSTED).

Yes, sorry for taking the shortcuts here.

> 
> It makes sense to have "struct sk_buff __kptr *" instead of "struct
> __sk_buff __kptr *". However, the get_kfunc_ptr_arg_type() is expecting
> KF_ARG_PTR_TO_CTX because the prog type is BPF_PROG_TYPE_SCHED_CLS.

Yes I have sk_buff as an kfunc's arg, I am using bpf_cast_to_kern_ctx() to
get __sk_buff->sk_buff conversion.

> 
> From a very quick look, under the "case KF_ARG_PTR_TO_CTX:" in
> check_kfunc_args(), I think it needs to teach the verifier that the
> reg->type with a trusted PTR_TO_BTF_ID ("struct sk_buff *") can be used as
> the PTR_TO_CTX.

But kfunc does not work on PTR_TO_CTX - it takes in directly sk_buff, not
__sk_buff. As I mention above we use bpf_cast_to_kern_ctx() and per my
current limited understanding it overwrites the reg->type to
PTR_TO_BTF_ID | PTR_TRUSTED.

However, as you said, due to prog type used and sk_buff as an arg,
get_kfunc_ptr_arg_type() interprets the kfunc's arg as KF_ARG_PTR_TO_CTX.

> 
> > > this be workarounded via some typedef or adding mentioned kfuncs to
> > > special_kfunc_list ? If the latter, then what else needs to be handled?
> > > 
> > > Commenting out sk_buff part from btf_is_projection_of() makes it work, but
> > > that probably is not a solution:)
> > > 
> > > Another question is in case bpf qdisc set lands, could we have these
> > > kfuncs not being limited to BPF_PROG_TYPE_STRUCT_OPS ?
> 
> Similar to Amery's comment. Please share the patch and user case. It will be
> easier to discuss.

I tried to simplify the use case that customer has, but I am a bit worried
that it might only confuse people more :/ however, here it is:

On TC egress hook skb is stored in a map - reason for picking it over the
linked list or rbtree is that we want to be able to access skbs via some index,
say a hash. This is where we bump the skb's refcount via acquire kfunc.

During TC ingress hook on the same interface, the skb that was previously
stored in map is retrieved, current skb that resides in the context of
hook carries the timestamp via metadata. We then use the retrieved skb and
tstamp from metadata on skb_tstamp_tx() (another kfunc) and finally
decrement skb's refcount via release kfunc.

Anyways, since we are able to do similar operations on task_struct
(holding it in map via kptr), I don't see a reason why wouldn't we allow
ourselves to do it on sk_buffs, no?

> 
> > In bpf qdisc case, we are still working on
> > releasing skb kptrs in maps or graphs automatically when .reset is
> > called so that we don't hold the resources forever.
> 
> Regarding specifically the bpf qdisc case, the .reset should do the right
> thing to release the queued skb. imo, after sleeping on it, if the bpf prog
> missed releasing the skb, it is fine to depend on the map destruction to
> finally release them. It is the same as other kptrs type stored in the map
> which will also be finally released during map_free.
> 
> In the future, for the struct_ops case, it can be improved by allowing to
> define the sch->privdata. May be allow to define the layout of this
> privdata, e.g. the whole privdata is a one element map backed by a btf id.
> The implementation will need to be generic enough for any bpf_struct_ops
> instead of something specific to the bpf-qdisc. This can be a follow up
> improvement as a more seamless per sch instance cleanup after the core
> bpf-qdisc pieces landed.
>