Re: [PATCH v2 bpf-next 05/14] bpf: allow to specify user-provided context value for BPF perf links

Yonghong Song <yhs@xxxxxx> · Fri, 30 Jul 2021 15:28:39 -0700

On 7/30/21 3:06 PM, Andrii Nakryiko wrote:
On Fri, Jul 30, 2021 at 2:34 PM Yonghong Song <yhs@xxxxxx> wrote:



On 7/30/21 10:48 AM, Andrii Nakryiko wrote:
On Thu, Jul 29, 2021 at 10:49 PM Yonghong Song <yhs@xxxxxx> wrote:



On 7/29/21 9:31 PM, Andrii Nakryiko wrote:
On Thu, Jul 29, 2021 at 11:00 AM Yonghong Song <yhs@xxxxxx> wrote:



On 7/26/21 9:12 AM, Andrii Nakryiko wrote:
Add ability for users to specify custom u64 value when creating BPF link for
perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints).

This is useful for cases when the same BPF program is used for attaching and
processing invocation of different tracepoints/kprobes/uprobes in a generic
fashion, but such that each invocation is distinguished from each other (e.g.,
BPF program can look up additional information associated with a specific
kernel function without having to rely on function IP lookups). This enables
new use cases to be implemented simply and efficiently that previously were
possible only through code generation (and thus multiple instances of almost
identical BPF program) or compilation at runtime (BCC-style) on target hosts
(even more expensive resource-wise). For uprobes it is not even possible in
some cases to know function IP before hand (e.g., when attaching to shared
library without PID filtering, in which case base load address is not known
for a library).

This is done by storing u64 user_ctx in struct bpf_prog_array_item,
corresponding to each attached and run BPF program. Given cgroup BPF programs
already use 2 8-byte pointers for their needs and cgroup BPF programs don't
have (yet?) support for user_ctx, reuse that space through union of
cgroup_storage and new user_ctx field.

Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
program execution code, which luckily is now also split from
BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
giving access to this user context value from inside a BPF program. Generic
perf_event BPF programs will access this value from perf_event itself through
passed in BPF program context.

Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
---
     drivers/media/rc/bpf-lirc.c    |  4 ++--
     include/linux/bpf.h            | 16 +++++++++++++++-
     include/linux/perf_event.h     |  1 +
     include/linux/trace_events.h   |  6 +++---
     include/uapi/linux/bpf.h       |  7 +++++++
     kernel/bpf/core.c              | 29 ++++++++++++++++++-----------
     kernel/bpf/syscall.c           |  2 +-
     kernel/events/core.c           | 21 ++++++++++++++-------
     kernel/trace/bpf_trace.c       |  8 +++++---
     tools/include/uapi/linux/bpf.h |  7 +++++++
     10 files changed, 73 insertions(+), 28 deletions(-)

diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
index afae0afe3f81..7490494273e4 100644
--- a/drivers/media/rc/bpf-lirc.c
+++ b/drivers/media/rc/bpf-lirc.c
@@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog)
                 goto unlock;
         }

-     ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array);
+     ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array);
         if (ret < 0)
                 goto unlock;

[...]
     void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 00b1267ab4f0..bc1fd54a8f58 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1448,6 +1448,13 @@ union bpf_attr {
                                 __aligned_u64   iter_info;      /* extra bpf_iter_link_info */
                                 __u32           iter_info_len;  /* iter_info length */
                         };
+                     struct {
+                             /* black box user-provided value passed through
+                              * to BPF program at the execution time and
+                              * accessible through bpf_get_user_ctx() BPF helper
+                              */
+                             __u64           user_ctx;
+                     } perf_event;

Is it possible to fold this field into previous union?

                    union {
                            __u32           target_btf_id;  /* btf_id of
target to attach to */
                            struct {
                                    __aligned_u64   iter_info;      /*
extra bpf_iter_link_info */
                                    __u32           iter_info_len;  /*
iter_info length */
                            };
                    };



I didn't want to do it, because different types of BPF links will
accept this user_ctx (or now bpf_cookie). And then we'll have to have
different locations of that field for different types of links.

For example, when/if we add this user_ctx to BPF iterator programs,
having __u64 user_ctx in the same anonymous union will make it overlap
with iter_info, which is a problem. So I want to have a link
type-specific sections in LINK_CREATE command section, to allow the
same field name at different locations.

I actually think that we should put iter_info/iter_info_len into a
named field, like this (also added user_ctx for bpf_iter link as a
demonstration):

struct {
       __aligned_u64 info;
       __u32         info_len;
       __aligned_u64 user_ctx;  /* see how it's at a different offset
than perf_event.user_ctx */
} iter;
struct {
       __u64         user_ctx;
} perf_event;

(of course keeping already existing fields in anonymous struct for
backwards compatibility)

Okay, then since user_ctx may be used by many link types. How
about just with the field "user_ctx" without struct perf_event.

I'd love to do it because it is indeed generic and common field, like
target_fd. But I'm not sure what you are proposing below. Where
exactly that user_ctx (now called bpf_cookie) goes in your example? I
see few possible options that allow preserving ABI backwards
compatibility. Let's see if you and everyone else likes any of those
better. I'll use the full LINK_CREATE sub-struct definition from
bpf_attr to make it clear. And to demonstrate how this can be extended
to bpf_iter in the future, please note this part as this is an
important aspect.

1. Full backwards compatibility and per-link type sections (my current
approach):

          struct { /* struct used by BPF_LINK_CREATE command */
                  __u32           prog_fd;
                  union {
                          __u32           target_fd;
                          __u32           target_ifindex;
                  };
                  __u32           attach_type;
                  __u32           flags;
                  union {
                          __u32           target_btf_id;
                          struct {
                                  __aligned_u64   iter_info;
                                  __u32           iter_info_len;
                          };
                          struct {
                                  __u64           bpf_cookie;
                          } perf_event;
                          struct {
                                  __aligned_u64   info;
                                  __u32           info_len;
                                  __aligned_u64   bpf_cookie;
                          } iter;
                 };
          } link_create;

The good property here is that we can keep easily extending link
type-specific sections with extra fields where needed. For common
stuff like bpf_cookie it's suboptimal because we'll need to duplicate
field definition in each struct inside that union, but I think that's
fine. From end-user point of view, they will know which type of link
they are creating, so the use will be straightforward. This is why I
went with this approach. But let's consider alternatives.

2. Non-backwards compatible layout but extra flag to specify that new
field layout is used.

          struct { /* struct used by BPF_LINK_CREATE command */
                  __u32           prog_fd;
                  union {
                          __u32           target_fd;
                          __u32           target_ifindex;
                  };
                  __u32           attach_type;
                  __u32           flags; /* this will start supporting
some new flag like BPF_F_LINK_CREATE_NEW */
                  __u64           bpf_cookie; /* common field now */
                  union { /* this parts is effectively deprecated now */
                          __u32           target_btf_id;
                          struct {
                                  __aligned_u64   iter_info;
                                  __u32           iter_info_len;
                          };
                          struct { /* this is new layout, but needs
BPF_F_LINK_CREATE_NEW, at least for ext/ and bpf_iter/ programs */
                              __u64       bpf_cookie;
                              union {
                                  struct {
                                      __u32     target_btf_id;
                                  } ext;
                                  struct {
                                      __aligned_u64 info;
                                      __u32         info_len;
                                  } iter;
                              }
                          }
                  };
          } link_create;

This makes bpf_cookie a common field, but at least for EXT (freplace/)
and ITER (bpf_iter/) links we need to specify extra flag to specify
that we are not using iter_info/iter_info_len/target_btf_id. bpf_iter
then will use iter.info and iter.info_len, and can use plain
bpf_cookie.

IMO, this is way too confusing and a maintainability nightmare.

I'm trying to guess what you are proposing, I can read it two ways,
but let me know if I missed something.

3. Just add bpf_cookie field before link type-specific section.

          struct { /* struct used by BPF_LINK_CREATE command */
                  __u32           prog_fd;
                  union {
                          __u32           target_fd;
                          __u32           target_ifindex;
                  };
                  __u32           attach_type;
                  __u32           flags;
                  __u64           bpf_cookie;  // <<<<<<<<<< HERE
                  union {
                          __u32           target_btf_id;
                          struct {
                                  __aligned_u64   iter_info;
                                  __u32           iter_info_len;
                          };
                  };
          } link_create;

This looks really nice and would be great, but that changes offsets
for target_btf_id/iter_info/iter_info_len, so a no go. The only way to
rectify this is what proposal #2 above does with an extra flag.

4. Add bpf_cookie after link-type specific part:

          struct { /* struct used by BPF_LINK_CREATE command */
                  __u32           prog_fd;
                  union {
                          __u32           target_fd;
                          __u32           target_ifindex;
                  };
                  __u32           attach_type;
                  __u32           flags;
                  union {
                          __u32           target_btf_id;
                          struct {
                                  __aligned_u64   iter_info;
                                  __u32           iter_info_len;
                          };
                          struct {
                  };
                  __u64           bpf_cookie; // <<<<<<<<<<<<<<<<<< HERE
          } link_create;

This could work. But we are wasting 16 bytes currently used for
target_btf_id/iter_info/iter_info_len. If we later need to do
something link type-specific, we can add it to the existing union if
we need <= 16 bytes, otherwise we'll need to start another union after
bpf_cookie, splitting this into two link type-specific sections.

Overall, this might work, especially assuming we won't need to extend
iter-specific portions. But I really hate that we didn't do named
structs inside that union (i.e., ext.target_btf_id and
iter.info/iter.info_len) and I'd like to rectify that in the follow up
patches with named structs duplicating existing field layout, but with
proper naming. But splitting this LINK_CREATE bpf_attr part into two
unions would make it hard and awkward in the future.

So, thoughts? Did you have something else in mind that I missed?

What I proposed is your option 4. Yes, in the future if there is there
are something we want to add to bpf iter, we can add to iter_info, so
it should not be an issue. Any other new link_type may utilized the same
union with
     struct {
        __aligned_u64  new_type_info;
        __u32          new_type_info_len;
     };
and this will put extensibility into new_type_info.
I know this may be a little bit hassle but it should work.


I see what you mean. With this extra pointer we shouldn't need more
than 16 bytes per link type. That's unnecessary complication for a lot
of simpler types of links, unfortunately, though definitely an option.

We could have also done approach #4 but maybe leave 16-32 bytes before
bpf_cookie for the union, so that it's much less likely that we'll run
out of space there. Not very clean either, so I don't know.

I'll keep it here for discussion for now, let's see if anyone has
strong preferences and opinions.

Your option 1 should work too, which is what I proposed in the beginning
to put into the union and we can feel free to add bpf_cookie for each
individual link type. This is actually cleaner.

Oh, you did? I must have misunderstood then. If you like approach #1,
then it's what I'm doing right now, so let's keep it as is and let's
see if anyone else has preferences.

Just checked old emails. It is actually my misunderstanding.
I probably mismatched "{" and "}" and thought you placed
outside the union and made the suggestion. So never mind,
we are on the same page :-)





Sometime like

__u64   user_ctx;

instead of

struct {
          __u64   user_ctx;
} perf_event;


I decided to not do that in this patch set, though, to not distract
from the main goal. But I think we should avoid this shared field
"namespace" across different link types going forward.


                 };
         } link_create;

[...]