On Thu, Feb 1, 2024 at 5:47 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > On 1/31/24 8:23 AM, Amery Hung wrote: > >>> 1. Passing a referenced kptr into a bpf program, which will also need > >>> to be released, or exchanged into maps or allocated objects. > >> "enqueue" should be the one considering here: > >> > >> struct Qdisc_ops { > >> /* ... */ > >> int (*enqueue)(struct sk_buff *skb, > >> struct Qdisc *sch, > >> struct sk_buff **to_free); > >> > >> }; > >> > >> The verifier only marks the skb as a trusted kptr but does not mark its > >> reg->ref_obj_id. Take a look at btf_ctx_access(). In particular: > >> > >> if (prog_args_trusted(prog)) > >> info->reg_type |= PTR_TRUSTED; > >> > >> The verifier does not know the skb ownership is passed into the ".enqueue" ops > >> and does not know the bpf prog needs to release it or store it in a map. > >> > >> The verifier tracks the reference state when a KF_ACQUIRE kfunc is called (just > >> an example, not saying we need to use KF_ACQUIRE kfunc). Take a look at > >> acquire_reference_state() which is the useful one here. > >> > >> Whenever the verifier is loading the ".enqueue" bpf_prog, the verifier can > >> always acquire_reference_state() for the "struct sk_buff *skb" argument. > >> > >> Take a look at a recent RFC: > >> https://lore.kernel.org/bpf/20240122212217.1391878-1-thinker.li@xxxxxxxxx/ > >> which is tagging the argument of an ops (e.g. ".enqueue" here). That RFC patch > >> is tagging the argument could be NULL by appending "__nullable" to the argument > >> name. The verifier will enforce that the bpf prog must check for NULL first. > >> > >> The similar idea can be used here but with a different tagging (for example, > >> "__must_release", admittedly not a good name). While the RFC patch is > >> in-progress, for now, may be hardcode for the ".enqueue" ops in > >> check_struct_ops_btf_id() and always acquire_reference_state() for the skb. This > >> part can be adjusted later once the RFC patch will be in shape. > >> > > Make sense. One more thing to consider here is that .enqueue is > > actually a reference acquiring and releasing function at the same > > time. Assuming ctx written to by a struct_ops program can be seen by > > the kernel, another new tag for the "to_free" argument will still be > > needed so that the verifier can recognize when writing skb to > > "to_free". > > I don't think "to_free" needs special tagging. I was thinking the > "bpf_qdisc_drop" kfunc could be a KF_RELEASE. Ideally, it should be like > > __bpf_kfunc int bpf_qdisc_drop(struct sk_buff *skb, struct Qdisc *sch, > struct sk_buff **to_free) > { > return qdisc_drop(skb, sch, to_free); > } > > However, I don't think the verifier supports pointer to pointer now. Meaning > "struct sk_buff **to_free" does not work. > > If the ptr indirection spinning in my head is sound, one possible solution to > unblock the qdisc work is to introduce: > > struct bpf_sk_buff_ptr { > struct sk_buff *skb; > }; > > and the bpf_qdisc_drop kfunc: > > __bpf_kfunc int bpf_qdisc_drop(struct sk_buff *skb, struct Qdisc *sch, > struct bpf_sk_buff_ptr *to_free_list) > > and the enqueue prog: > > SEC("struct_ops/enqueue") > int BPF_PROG(test_enqueue, struct sk_buff *skb, > struct Qdisc *sch, > struct bpf_sk_buff_ptr *to_free_list) > { > return bpf_qdisc_drop(skb, sch, to_free_list); > } > > and the ".is_valid_access" needs to change the btf_type from "struct sk_buff **" > to "struct bpf_sk_buff_ptr *" which is sort of similar to the bpf_tcp_ca.c that > is changing the "struct sock *" type to the "struct tcp_sock *" type. > > I have the compiler-tested idea here: > https://git.kernel.org/pub/scm/linux/kernel/git/martin.lau/bpf-next.git/log/?h=qdisc-ideas > > > > > >> Then one more thing is to track when the struct_ops bpf prog is actually reading > >> the value of the skb pointer. One thing is worth to mention here, e.g. a > >> struct_ops prog for enqueue: > >> > >> SEC("struct_ops") > >> int BPF_PROG(bpf_dropall_enqueue, struct sk_buff *skb, struct Qdisc *sch, > >> struct sk_buff **to_free) > >> { > >> return bpf_qdisc_drop(skb, sch, to_free); > >> } > >> > >> Take a look at the BPF_PROG macro, the bpf prog is getting a pointer to an array > >> of __u64 as the only argument. The skb is actually in ctx[0], sch is in > >> ctx[1]...etc. When ctx[0] is read to get the skb pointer (e.g. r1 = ctx[0]), > >> btf_ctx_access() marks the reg_type to PTR_TRUSTED. It needs to also initialize > >> the reg->ref_obj_id by the id obtained earlier from acquire_reference_state() > >> during check_struct_ops_btf_id() somehow. > I appreciate the idea. The pointer redirection works without problems. I now have a working fifo bpf qdisc using struct_ops. I will explore how other parts of qdisc work with struct_ops. Thanks, Amery