Re: [RFC PATCH bpf-next 1/2] bpf: Introduce global percpu data

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Thu, 16 Jan 2025 15:37:25 -0800

On Wed, Jan 15, 2025 at 11:22 PM Leon Hwang <leon.hwang@xxxxxxxxx> wrote:
>
> Hi,
>
> On 15/1/25 07:10, Andrii Nakryiko wrote:
> > On Mon, Jan 13, 2025 at 7:25 AM Leon Hwang <leon.hwang@xxxxxxxxx> wrote:
> >>
> >> This patch introduces global per-CPU data, inspired by commit
> >> 6316f78306c1 ("Merge branch 'support-global-data'"). It enables the
> >> definition of global per-CPU variables in BPF, similar to the
> >> DEFINE_PER_CPU() macro in the kernel[0].
> >>
> >> For example, in BPF, it is able to define a global per-CPU variable like
> >> this:
> >>
> >> int percpu_data SEC(".data..percpu");
> >>
> >> With this patch, tools like retsnoop[1] and bpflbr[2] can simplify their
> >> BPF code for handling LBRs. The code can be updated from
> >>
> >> static struct perf_branch_entry lbrs[1][MAX_LBR_ENTRIES] SEC(".data.lbrs");
> >>
> >> to
> >>
> >> static struct perf_branch_entry lbrs[MAX_LBR_ENTRIES] SEC(".data..percpu.lbrs");
> >>
> >> This eliminates the need to retrieve the CPU ID using the
> >> bpf_get_smp_processor_id() helper.
> >>
> >> Additionally, by reusing global per-CPU variables, sharing information
> >> between tail callers and callees or freplace callers and callees becomes
> >> simpler compared to using percpu_array maps.
> >>
> >> Links:
> >> [0] https://github.com/torvalds/linux/blob/fbfd64d25c7af3b8695201ebc85efe90be28c5a3/include/linux/percpu-defs.h#L114
> >> [1] https://github.com/anakryiko/retsnoop
> >> [2] https://github.com/Asphaltt/bpflbr
> >>
> >> Signed-off-by: Leon Hwang <leon.hwang@xxxxxxxxx>
> >> ---
> >>  kernel/bpf/arraymap.c  |  39 +++++++++++++-
> >>  kernel/bpf/verifier.c  |  45 +++++++++++++++++
> >>  tools/lib/bpf/libbpf.c | 112 ++++++++++++++++++++++++++++++++---------
> >>  3 files changed, 171 insertions(+), 25 deletions(-)
> >>
> >
> > So I think the feature overall makes sense, but we need to think
> > through at least libbpf's side of things some more. Unlike .data,
> > per-cpu .data section is not mmapable, and so that has implication on
> > BPF skeleton and we should make sure all that makes sense on BPF
> > skeleton side. In that sense, per-cpu global data is more akin to
> > struct_ops initialization image, which can be accessed by user before
> > skeleton is loaded to initialize the image.
> >
> > There are a few things to consider. What's the BPF skeleton interface?
> > Do we expose it as single struct and use that struct as initial image
> > for each CPU (which means user won't be able to initialize different
> > CPU data differently, at least not through BPF skeleton facilities)?
> > Or do we expose this as an array of structs and let user set each CPU
> > data independently?
> >
> > I feel like keeping it simple and having one image for all CPUs would
> > cover most cases. And users can still access the underlying
> > PERCPU_ARRAY map if they need more control.
>
> Agree. It is necessary to keep it simple.
>
> >
> > But either way, we need tests for skeleton, making sure we NULL-out
> > this per-cpu global data, but take it into account before the load.
> >
> > Also, this huge calloc for possible CPUs, I'd like to avoid it
> > altogether for the (probably very common) zero-initialized case.
>
> Ack.
>
> >
> > So in short, needs a bit of iteration to figure out all the
> > interfacing issues, but makes sense overall. See some more low-level
> > remarks below.
> >
>
> It is challenging to figure out them. I'll do my best to achieve it.
>
> > pw-bot: cr
> >
> >

[...]

> >
> >> @@ -516,6 +516,7 @@ struct bpf_struct_ops {
> >>  };
> >>
> >>  #define DATA_SEC ".data"
> >> +#define PERCPU_DATA_SEC ".data..percpu"
> >
> > I don't like this prefix, even if that's what we have in the kernel.
> > Something like just ".percpu" or ".percpu_data" or ".data_percpu" is
> > better, IMO.
>
> I tested ".percpu". It is OK to use it. But we have to update "bpftool
> gen" too, which relies on these section names.
>
> Is it better to keep ".data" prefix, like ".data.percpu", ".data_percpu"?
> Can keeping ".data" prefix reduce some works on bpftool, go-ebpf and
> akin bpf loaders?

It's literally two lines of code in gen.c, and that should actually be
a common array of known prefixes. Even if someone uses this new
.percpu section with old bpftool nothing will break, they just won't
have structure representing the initial per-CPU image. They will still
have the generic map pointer in the skeleton. So I think this is
acceptable.

I'd definitely go with a simple and less error-prone ".percpu" prefix.

[...]