On 1/29/24 22:39, Martin KaFai Lau wrote:
On 1/26/24 5:17 PM, Amery Hung wrote:
On Thu, Jan 25, 2024 at 6:22 PM Martin KaFai Lau
<martin.lau@xxxxxxxxx> wrote:
On 1/23/24 9:22 PM, Amery Hung wrote:
I looked at the high level of the patchset. The major ops that it
wants to be
programmable in bpf is the ".enqueue" and ".dequeue" (+ ".init" and
".reset" in
patch 4 and patch 5).
This patch adds a new prog type BPF_PROG_TYPE_QDISC, four attach
types (each for
".enqueue", ".dequeue", ".init", and ".reset"), and a new
"bpf_qdisc_ctx" in the
uapi. It is no long an acceptable way to add new bpf extension.
Can the ".enqueue", ".dequeue", ".init", and ".reset" be completely
implemented
in bpf (with the help of new kfuncs if needed)? Then a struct_ops
for Qdisc_ops
can be created. The bpf Qdisc_ops can be loaded through the
existing struct_ops api.
Partially. If using struct_ops, I think we'll need another structure
like the following in bpf qdisc to be implemented with struct_ops bpf:
struct bpf_qdisc_ops {
int (*enqueue) (struct sk_buff *skb)
void (*dequeue) (void)
void (*init) (void)
void (*reset) (void)
};
Then, Qdisc_ops will wrap around them to handle things that cannot be
implemented with bpf (e.g., sch_tree_lock, returning a skb ptr).
We can see how those limitations (calling sch_tree_lock() and
returning a ptr)
can be addressed in bpf. This will also help other similar use cases.
For kptr, I wonder if we can support the following semantics in bpf if
they make sense:
I think they are useful but they are not fully supported now.
Some thoughts below.
1. Passing a referenced kptr into a bpf program, which will also need
to be released, or exchanged into maps or allocated objects.
"enqueue" should be the one considering here:
struct Qdisc_ops {
/* ... */
int (*enqueue)(struct sk_buff *skb,
struct Qdisc *sch,
struct sk_buff **to_free);
};
The verifier only marks the skb as a trusted kptr but does not mark its
reg->ref_obj_id. Take a look at btf_ctx_access(). In particular:
if (prog_args_trusted(prog))
info->reg_type |= PTR_TRUSTED;
The verifier does not know the skb ownership is passed into the
".enqueue" ops and does not know the bpf prog needs to release it or
store it in a map.
The verifier tracks the reference state when a KF_ACQUIRE kfunc is
called (just an example, not saying we need to use KF_ACQUIRE kfunc).
Take a look at acquire_reference_state() which is the useful one here.
Whenever the verifier is loading the ".enqueue" bpf_prog, the verifier
can always acquire_reference_state() for the "struct sk_buff *skb"
argument.
Take a look at a recent RFC:
https://lore.kernel.org/bpf/20240122212217.1391878-1-thinker.li@xxxxxxxxx/
which is tagging the argument of an ops (e.g. ".enqueue" here). That RFC
patch is tagging the argument could be NULL by appending "__nullable" to
the argument name. The verifier will enforce that the bpf prog must
check for NULL first.
The similar idea can be used here but with a different tagging (for
example, "__must_release", admittedly not a good name). While the RFC
patch is in-progress, for now, may be hardcode for the ".enqueue" ops in
check_struct_ops_btf_id() and always acquire_reference_state() for the
skb. This part can be adjusted later once the RFC patch will be in shape.
Then one more thing is to track when the struct_ops bpf prog is actually
reading the value of the skb pointer. One thing is worth to mention
here, e.g. a struct_ops prog for enqueue:
SEC("struct_ops")
int BPF_PROG(bpf_dropall_enqueue, struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
return bpf_qdisc_drop(skb, sch, to_free);
}
Take a look at the BPF_PROG macro, the bpf prog is getting a pointer to
an array of __u64 as the only argument. The skb is actually in ctx[0],
sch is in ctx[1]...etc. When ctx[0] is read to get the skb pointer (e.g.
r1 = ctx[0]), btf_ctx_access() marks the reg_type to PTR_TRUSTED. It
needs to also initialize the reg->ref_obj_id by the id obtained earlier
from acquire_reference_state() during check_struct_ops_btf_id() somehow.
2. Returning a kptr from a program and treating it as releasing the
reference.
e.g. for dequeue:
struct Qdisc_ops {
/* ... */
struct sk_buff * (*dequeue)(struct Qdisc *);
};
Right now the verifier should complain on check_reference_leak() if the
struct_ops bpf prog is returning a referenced kptr.
Unlike an argument, the return type of a function does not have a name
to tag. It is the first case that a struct_ops bpf_prog returning a
We may tag the stub functions instead, right?
Is the purpose here to return a referenced pointer from a struct_ops
operator without verifier complaining?
pointer. One idea is to assume it must be a trusted pointer
(PTR_TRUSTED) and the verifier should check it is indeed with
PTR_TRUSTED flag.
May be release_reference_state() can be called to assume the kernel will
release it as long as the return pointer type is PTR_TRUSTED and the
type matches the return type of the ops. Take a look at
check_return_code().