On Thu, Jul 29, 2021 at 10:49 PM Yonghong Song <yhs@xxxxxx> wrote: > > > > On 7/29/21 9:31 PM, Andrii Nakryiko wrote: > > On Thu, Jul 29, 2021 at 11:00 AM Yonghong Song <yhs@xxxxxx> wrote: > >> > >> > >> > >> On 7/26/21 9:12 AM, Andrii Nakryiko wrote: > >>> Add ability for users to specify custom u64 value when creating BPF link for > >>> perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints). > >>> > >>> This is useful for cases when the same BPF program is used for attaching and > >>> processing invocation of different tracepoints/kprobes/uprobes in a generic > >>> fashion, but such that each invocation is distinguished from each other (e.g., > >>> BPF program can look up additional information associated with a specific > >>> kernel function without having to rely on function IP lookups). This enables > >>> new use cases to be implemented simply and efficiently that previously were > >>> possible only through code generation (and thus multiple instances of almost > >>> identical BPF program) or compilation at runtime (BCC-style) on target hosts > >>> (even more expensive resource-wise). For uprobes it is not even possible in > >>> some cases to know function IP before hand (e.g., when attaching to shared > >>> library without PID filtering, in which case base load address is not known > >>> for a library). > >>> > >>> This is done by storing u64 user_ctx in struct bpf_prog_array_item, > >>> corresponding to each attached and run BPF program. Given cgroup BPF programs > >>> already use 2 8-byte pointers for their needs and cgroup BPF programs don't > >>> have (yet?) support for user_ctx, reuse that space through union of > >>> cgroup_storage and new user_ctx field. > >>> > >>> Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx. > >>> This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF > >>> program execution code, which luckily is now also split from > >>> BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper > >>> giving access to this user context value from inside a BPF program. Generic > >>> perf_event BPF programs will access this value from perf_event itself through > >>> passed in BPF program context. > >>> > >>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > >>> Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > >>> --- > >>> drivers/media/rc/bpf-lirc.c | 4 ++-- > >>> include/linux/bpf.h | 16 +++++++++++++++- > >>> include/linux/perf_event.h | 1 + > >>> include/linux/trace_events.h | 6 +++--- > >>> include/uapi/linux/bpf.h | 7 +++++++ > >>> kernel/bpf/core.c | 29 ++++++++++++++++++----------- > >>> kernel/bpf/syscall.c | 2 +- > >>> kernel/events/core.c | 21 ++++++++++++++------- > >>> kernel/trace/bpf_trace.c | 8 +++++--- > >>> tools/include/uapi/linux/bpf.h | 7 +++++++ > >>> 10 files changed, 73 insertions(+), 28 deletions(-) > >>> > >>> diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c > >>> index afae0afe3f81..7490494273e4 100644 > >>> --- a/drivers/media/rc/bpf-lirc.c > >>> +++ b/drivers/media/rc/bpf-lirc.c > >>> @@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog) > >>> goto unlock; > >>> } > >>> > >>> - ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array); > >>> + ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array); > >>> if (ret < 0) > >>> goto unlock; > >>> > >> [...] > >>> void bpf_trace_run1(struct bpf_prog *prog, u64 arg1); > >>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > >>> index 00b1267ab4f0..bc1fd54a8f58 100644 > >>> --- a/include/uapi/linux/bpf.h > >>> +++ b/include/uapi/linux/bpf.h > >>> @@ -1448,6 +1448,13 @@ union bpf_attr { > >>> __aligned_u64 iter_info; /* extra bpf_iter_link_info */ > >>> __u32 iter_info_len; /* iter_info length */ > >>> }; > >>> + struct { > >>> + /* black box user-provided value passed through > >>> + * to BPF program at the execution time and > >>> + * accessible through bpf_get_user_ctx() BPF helper > >>> + */ > >>> + __u64 user_ctx; > >>> + } perf_event; > >> > >> Is it possible to fold this field into previous union? > >> > >> union { > >> __u32 target_btf_id; /* btf_id of > >> target to attach to */ > >> struct { > >> __aligned_u64 iter_info; /* > >> extra bpf_iter_link_info */ > >> __u32 iter_info_len; /* > >> iter_info length */ > >> }; > >> }; > >> > >> > > > > I didn't want to do it, because different types of BPF links will > > accept this user_ctx (or now bpf_cookie). And then we'll have to have > > different locations of that field for different types of links. > > > > For example, when/if we add this user_ctx to BPF iterator programs, > > having __u64 user_ctx in the same anonymous union will make it overlap > > with iter_info, which is a problem. So I want to have a link > > type-specific sections in LINK_CREATE command section, to allow the > > same field name at different locations. > > > > I actually think that we should put iter_info/iter_info_len into a > > named field, like this (also added user_ctx for bpf_iter link as a > > demonstration): > > > > struct { > > __aligned_u64 info; > > __u32 info_len; > > __aligned_u64 user_ctx; /* see how it's at a different offset > > than perf_event.user_ctx */ > > } iter; > > struct { > > __u64 user_ctx; > > } perf_event; > > > > (of course keeping already existing fields in anonymous struct for > > backwards compatibility) > > Okay, then since user_ctx may be used by many link types. How > about just with the field "user_ctx" without struct perf_event. I'd love to do it because it is indeed generic and common field, like target_fd. But I'm not sure what you are proposing below. Where exactly that user_ctx (now called bpf_cookie) goes in your example? I see few possible options that allow preserving ABI backwards compatibility. Let's see if you and everyone else likes any of those better. I'll use the full LINK_CREATE sub-struct definition from bpf_attr to make it clear. And to demonstrate how this can be extended to bpf_iter in the future, please note this part as this is an important aspect. 1. Full backwards compatibility and per-link type sections (my current approach): struct { /* struct used by BPF_LINK_CREATE command */ __u32 prog_fd; union { __u32 target_fd; __u32 target_ifindex; }; __u32 attach_type; __u32 flags; union { __u32 target_btf_id; struct { __aligned_u64 iter_info; __u32 iter_info_len; }; struct { __u64 bpf_cookie; } perf_event; struct { __aligned_u64 info; __u32 info_len; __aligned_u64 bpf_cookie; } iter; }; } link_create; The good property here is that we can keep easily extending link type-specific sections with extra fields where needed. For common stuff like bpf_cookie it's suboptimal because we'll need to duplicate field definition in each struct inside that union, but I think that's fine. From end-user point of view, they will know which type of link they are creating, so the use will be straightforward. This is why I went with this approach. But let's consider alternatives. 2. Non-backwards compatible layout but extra flag to specify that new field layout is used. struct { /* struct used by BPF_LINK_CREATE command */ __u32 prog_fd; union { __u32 target_fd; __u32 target_ifindex; }; __u32 attach_type; __u32 flags; /* this will start supporting some new flag like BPF_F_LINK_CREATE_NEW */ __u64 bpf_cookie; /* common field now */ union { /* this parts is effectively deprecated now */ __u32 target_btf_id; struct { __aligned_u64 iter_info; __u32 iter_info_len; }; struct { /* this is new layout, but needs BPF_F_LINK_CREATE_NEW, at least for ext/ and bpf_iter/ programs */ __u64 bpf_cookie; union { struct { __u32 target_btf_id; } ext; struct { __aligned_u64 info; __u32 info_len; } iter; } } }; } link_create; This makes bpf_cookie a common field, but at least for EXT (freplace/) and ITER (bpf_iter/) links we need to specify extra flag to specify that we are not using iter_info/iter_info_len/target_btf_id. bpf_iter then will use iter.info and iter.info_len, and can use plain bpf_cookie. IMO, this is way too confusing and a maintainability nightmare. I'm trying to guess what you are proposing, I can read it two ways, but let me know if I missed something. 3. Just add bpf_cookie field before link type-specific section. struct { /* struct used by BPF_LINK_CREATE command */ __u32 prog_fd; union { __u32 target_fd; __u32 target_ifindex; }; __u32 attach_type; __u32 flags; __u64 bpf_cookie; // <<<<<<<<<< HERE union { __u32 target_btf_id; struct { __aligned_u64 iter_info; __u32 iter_info_len; }; }; } link_create; This looks really nice and would be great, but that changes offsets for target_btf_id/iter_info/iter_info_len, so a no go. The only way to rectify this is what proposal #2 above does with an extra flag. 4. Add bpf_cookie after link-type specific part: struct { /* struct used by BPF_LINK_CREATE command */ __u32 prog_fd; union { __u32 target_fd; __u32 target_ifindex; }; __u32 attach_type; __u32 flags; union { __u32 target_btf_id; struct { __aligned_u64 iter_info; __u32 iter_info_len; }; struct { }; __u64 bpf_cookie; // <<<<<<<<<<<<<<<<<< HERE } link_create; This could work. But we are wasting 16 bytes currently used for target_btf_id/iter_info/iter_info_len. If we later need to do something link type-specific, we can add it to the existing union if we need <= 16 bytes, otherwise we'll need to start another union after bpf_cookie, splitting this into two link type-specific sections. Overall, this might work, especially assuming we won't need to extend iter-specific portions. But I really hate that we didn't do named structs inside that union (i.e., ext.target_btf_id and iter.info/iter.info_len) and I'd like to rectify that in the follow up patches with named structs duplicating existing field layout, but with proper naming. But splitting this LINK_CREATE bpf_attr part into two unions would make it hard and awkward in the future. So, thoughts? Did you have something else in mind that I missed? > Sometime like > > __u64 user_ctx; > > instead of > > struct { > __u64 user_ctx; > } perf_event; > > > > > I decided to not do that in this patch set, though, to not distract > > from the main goal. But I think we should avoid this shared field > > "namespace" across different link types going forward. > > > > > >>> }; > >>> } link_create; > >>> > >> [...]