Re: [PATCH bpf] bpf: Add LBR data to BPF_PROG_TYPE_PERF_EVENT prog context

"Daniel Xu" <dxu@xxxxxxxxx> · Mon, 16 Dec 2019 11:29:03 -0800

On Fri Dec 6, 2019 at 9:10 AM, Andrii Nakryiko wrote:
> On Thu, Dec 5, 2019 at 4:13 PM Daniel Xu <dxu@xxxxxxxxx> wrote:
> >
> > Last-branch-record is an intel CPU feature that can be configured to
> > record certain branches that are taken during code execution. This data
> > is particularly interesting for profile guided optimizations. perf has
> > had LBR support for a while but the data collection can be a bit coarse
> > grained.
> >
> > We (Facebook) have recently run a lot of experiments with feeding
> > filtered LBR data to various PGO pipelines. We've seen really good
> > results (+2.5% throughput with lower cpu util and lower latency) by
> > feeding high request latency LBR branches to the compiler on a
> > request-oriented service. We used bpf to read a special request context
> > ID (which is how we associate branches with latency) from a fixed
> > userspace address. Reading from the fixed address is why bpf support is
> > useful.
> >
> > Aside from this particular use case, having LBR data available to bpf
> > progs can be useful to get stack traces out of userspace applications
> > that omit frame pointers.
> >
> > This patch adds support for LBR data to bpf perf progs.
> >
> > Some notes:
> > * We use `__u64 entries[BPF_MAX_LBR_ENTRIES * 3]` instead of
> >   `struct perf_branch_entry[BPF_MAX_LBR_ENTRIES]` because checkpatch.pl
> >   warns about including a uapi header from another uapi header
> >
> > * We define BPF_MAX_LBR_ENTRIES as 32 (instead of using the value from
> >   arch/x86/events/perf_events.h) because including arch specific headers
> >   seems wrong and could introduce circular header includes.
> >
> > Signed-off-by: Daniel Xu <dxu@xxxxxxxxx>
> > ---
> >  include/uapi/linux/bpf_perf_event.h |  5 ++++
> >  kernel/trace/bpf_trace.c            | 39 +++++++++++++++++++++++++++++
> >  2 files changed, 44 insertions(+)
> >
> > diff --git a/include/uapi/linux/bpf_perf_event.h b/include/uapi/linux/bpf_perf_event.h
> > index eb1b9d21250c..dc87e3d50390 100644
> > --- a/include/uapi/linux/bpf_perf_event.h
> > +++ b/include/uapi/linux/bpf_perf_event.h
> > @@ -10,10 +10,15 @@
> >
> >  #include <asm/bpf_perf_event.h>
> >
> > +#define BPF_MAX_LBR_ENTRIES 32
> > +
> >  struct bpf_perf_event_data {
> >         bpf_user_pt_regs_t regs;
> >         __u64 sample_period;
> >         __u64 addr;
> > +       __u64 nr_lbr;
> > +       /* Cast to struct perf_branch_entry* before using */
> > +       __u64 entries[BPF_MAX_LBR_ENTRIES * 3];
> >  };
> >
>
> 
> I wonder if instead of hard-coding this in bpf_perf_event_data, could
> we achieve this and perhaps even more flexibility by letting users
> access underlying bpf_perf_event_data_kern and use CO-RE to read
> whatever needs to be read from perf_sample_data, perf_event, etc?
> Would that work?
>
> 
> >  #endif /* _UAPI__LINUX_BPF_PERF_EVENT_H__ */
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index ffc91d4935ac..96ba7995b3d7 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
>
> 
> [...]
>

Sorry about the late response. I chatted w/ Andrii last week and spent
some time playing with alternatives. It turns out we can read lbr data
by casting the bpf_perf_event_data to the internal kernel datastructure
and doing some well placed bpf_probe_read's.

Unless someone else thinks this patch would be useful, I will probably
abandon it for now (unless we experience enough pain from doing these
casts). If I did a v2, I would probably add a bpf helper instead of
modifying the ctx to get around the ugly api limitations.

Daniel