On 7/29/21 9:31 PM, Andrii Nakryiko wrote:
On Thu, Jul 29, 2021 at 11:00 AM Yonghong Song <yhs@xxxxxx> wrote:
On 7/26/21 9:12 AM, Andrii Nakryiko wrote:
Add ability for users to specify custom u64 value when creating BPF link for
perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints).
This is useful for cases when the same BPF program is used for attaching and
processing invocation of different tracepoints/kprobes/uprobes in a generic
fashion, but such that each invocation is distinguished from each other (e.g.,
BPF program can look up additional information associated with a specific
kernel function without having to rely on function IP lookups). This enables
new use cases to be implemented simply and efficiently that previously were
possible only through code generation (and thus multiple instances of almost
identical BPF program) or compilation at runtime (BCC-style) on target hosts
(even more expensive resource-wise). For uprobes it is not even possible in
some cases to know function IP before hand (e.g., when attaching to shared
library without PID filtering, in which case base load address is not known
for a library).
This is done by storing u64 user_ctx in struct bpf_prog_array_item,
corresponding to each attached and run BPF program. Given cgroup BPF programs
already use 2 8-byte pointers for their needs and cgroup BPF programs don't
have (yet?) support for user_ctx, reuse that space through union of
cgroup_storage and new user_ctx field.
Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
program execution code, which luckily is now also split from
BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
giving access to this user context value from inside a BPF program. Generic
perf_event BPF programs will access this value from perf_event itself through
passed in BPF program context.
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
---
drivers/media/rc/bpf-lirc.c | 4 ++--
include/linux/bpf.h | 16 +++++++++++++++-
include/linux/perf_event.h | 1 +
include/linux/trace_events.h | 6 +++---
include/uapi/linux/bpf.h | 7 +++++++
kernel/bpf/core.c | 29 ++++++++++++++++++-----------
kernel/bpf/syscall.c | 2 +-
kernel/events/core.c | 21 ++++++++++++++-------
kernel/trace/bpf_trace.c | 8 +++++---
tools/include/uapi/linux/bpf.h | 7 +++++++
10 files changed, 73 insertions(+), 28 deletions(-)
diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
index afae0afe3f81..7490494273e4 100644
--- a/drivers/media/rc/bpf-lirc.c
+++ b/drivers/media/rc/bpf-lirc.c
@@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog)
goto unlock;
}
- ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array);
+ ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array);
if (ret < 0)
goto unlock;
[...]
void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 00b1267ab4f0..bc1fd54a8f58 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1448,6 +1448,13 @@ union bpf_attr {
__aligned_u64 iter_info; /* extra bpf_iter_link_info */
__u32 iter_info_len; /* iter_info length */
};
+ struct {
+ /* black box user-provided value passed through
+ * to BPF program at the execution time and
+ * accessible through bpf_get_user_ctx() BPF helper
+ */
+ __u64 user_ctx;
+ } perf_event;
Is it possible to fold this field into previous union?
union {
__u32 target_btf_id; /* btf_id of
target to attach to */
struct {
__aligned_u64 iter_info; /*
extra bpf_iter_link_info */
__u32 iter_info_len; /*
iter_info length */
};
};
I didn't want to do it, because different types of BPF links will
accept this user_ctx (or now bpf_cookie). And then we'll have to have
different locations of that field for different types of links.
For example, when/if we add this user_ctx to BPF iterator programs,
having __u64 user_ctx in the same anonymous union will make it overlap
with iter_info, which is a problem. So I want to have a link
type-specific sections in LINK_CREATE command section, to allow the
same field name at different locations.
I actually think that we should put iter_info/iter_info_len into a
named field, like this (also added user_ctx for bpf_iter link as a
demonstration):
struct {
__aligned_u64 info;
__u32 info_len;
__aligned_u64 user_ctx; /* see how it's at a different offset
than perf_event.user_ctx */
} iter;
struct {
__u64 user_ctx;
} perf_event;
(of course keeping already existing fields in anonymous struct for
backwards compatibility)
Okay, then since user_ctx may be used by many link types. How
about just with the field "user_ctx" without struct perf_event.
Sometime like
__u64 user_ctx;
instead of
struct {
__u64 user_ctx;
} perf_event;
I decided to not do that in this patch set, though, to not distract
from the main goal. But I think we should avoid this shared field
"namespace" across different link types going forward.
};
} link_create;
[...]