Re: [RFC PATCH v7 1/8] net_sched: Introduce eBPF based Qdisc

Martin KaFai Lau <martin.lau@xxxxxxxxx> · Thu, 1 Feb 2024 17:47:44 -0800

On 1/31/24 8:23 AM, Amery Hung wrote:
1. Passing a referenced kptr into a bpf program, which will also need
to be released, or exchanged into maps or allocated objects.
"enqueue" should be the one considering here:

struct Qdisc_ops {
         /* ... */
         int                     (*enqueue)(struct sk_buff *skb,
                                            struct Qdisc *sch,
                                            struct sk_buff **to_free);

};

The verifier only marks the skb as a trusted kptr but does not mark its
reg->ref_obj_id. Take a look at btf_ctx_access(). In particular:

         if (prog_args_trusted(prog))
                 info->reg_type |= PTR_TRUSTED;

The verifier does not know the skb ownership is passed into the ".enqueue" ops
and does not know the bpf prog needs to release it or store it in a map.

The verifier tracks the reference state when a KF_ACQUIRE kfunc is called (just
an example, not saying we need to use KF_ACQUIRE kfunc). Take a look at
acquire_reference_state() which is the useful one here.

Whenever the verifier is loading the ".enqueue" bpf_prog, the verifier can
always acquire_reference_state() for the "struct sk_buff *skb" argument.

Take a look at a recent RFC:
https://lore.kernel.org/bpf/20240122212217.1391878-1-thinker.li@xxxxxxxxx/
which is tagging the argument of an ops (e.g. ".enqueue" here). That RFC patch
is tagging the argument could be NULL by appending "__nullable" to the argument
name. The verifier will enforce that the bpf prog must check for NULL first.

The similar idea can be used here but with a different tagging (for example,
"__must_release", admittedly not a good name). While the RFC patch is
in-progress, for now, may be hardcode for the ".enqueue" ops in
check_struct_ops_btf_id() and always acquire_reference_state() for the skb. This
part can be adjusted later once the RFC patch will be in shape.

Make sense. One more thing to consider here is that .enqueue is
actually a reference acquiring and releasing function at the same
time. Assuming ctx written to by a struct_ops program can be seen by
the kernel, another new tag for the "to_free" argument will still be
needed so that the verifier can recognize when writing skb to
"to_free".

I don't think "to_free" needs special tagging. I was thinking the 
"bpf_qdisc_drop" kfunc could be a KF_RELEASE. Ideally, it should be like

__bpf_kfunc int bpf_qdisc_drop(struct sk_buff *skb, struct Qdisc *sch,
	                       struct sk_buff **to_free)
{
	return qdisc_drop(skb, sch, to_free);
}

However, I don't think the verifier supports pointer to pointer now. Meaning
"struct sk_buff **to_free" does not work.

If the ptr indirection spinning in my head is sound, one possible solution to 
unblock the qdisc work is to introduce:

struct bpf_sk_buff_ptr {
	struct sk_buff *skb;
};

and the bpf_qdisc_drop kfunc:

__bpf_kfunc int bpf_qdisc_drop(struct sk_buff *skb, struct Qdisc *sch,
                               struct bpf_sk_buff_ptr *to_free_list)

and the enqueue prog:

SEC("struct_ops/enqueue")
int BPF_PROG(test_enqueue, struct sk_buff *skb,
             struct Qdisc *sch,
             struct bpf_sk_buff_ptr *to_free_list)
{
	return bpf_qdisc_drop(skb, sch, to_free_list);
}

and the ".is_valid_access" needs to change the btf_type from "struct sk_buff **" 
to "struct bpf_sk_buff_ptr *" which is sort of similar to the bpf_tcp_ca.c that 
is changing the "struct sock *" type to the "struct tcp_sock *" type.

I have the compiler-tested idea here: 
https://git.kernel.org/pub/scm/linux/kernel/git/martin.lau/bpf-next.git/log/?h=qdisc-ideas

Then one more thing is to track when the struct_ops bpf prog is actually reading
the value of the skb pointer. One thing is worth to mention here, e.g. a
struct_ops prog for enqueue:

SEC("struct_ops")
int BPF_PROG(bpf_dropall_enqueue, struct sk_buff *skb, struct Qdisc *sch,
              struct sk_buff **to_free)
{
         return bpf_qdisc_drop(skb, sch, to_free);
}

Take a look at the BPF_PROG macro, the bpf prog is getting a pointer to an array
of __u64 as the only argument. The skb is actually in ctx[0], sch is in
ctx[1]...etc. When ctx[0] is read to get the skb pointer (e.g. r1 = ctx[0]),
btf_ctx_access() marks the reg_type to PTR_TRUSTED. It needs to also initialize
the reg->ref_obj_id by the id obtained earlier from acquire_reference_state()
during check_struct_ops_btf_id() somehow.