Re: [PATCH] perf/bpf: Don't call bpf_overflow_handler() for tracing events

Kyle Huey <me@xxxxxxxxxxxx> · Mon, 15 Jul 2024 07:33:57 -0700

On Mon, Jul 15, 2024 at 4:12 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Sat, Jul 13, 2024 at 10:32:07PM +0200, Jiri Olsa wrote:
> > On Fri, Jul 12, 2024 at 09:46:45PM -0700, Kyle Huey wrote:
> > > The regressing commit is new in 6.10. It assumed that anytime event->prog
> > > is set bpf_overflow_handler() should be invoked to execute the attached bpf
> > > program. This assumption is false for tracing events, and as a result the
> > > regressing commit broke bpftrace by invoking the bpf handler with garbage
> > > inputs on overflow.
> > >
> > > Prior to the regression the overflow handlers formed a chain (of length 0,
> > > 1, or 2) and perf_event_set_bpf_handler() (the !tracing case) added
> > > bpf_overflow_handler() to that chain, while perf_event_attach_bpf_prog()
> > > (the tracing case) did not. Both set event->prog. The chain of overflow
> > > handlers was replaced by a single overflow handler slot and a fixed call to
> > > bpf_overflow_handler() when appropriate. This modifies the condition there
> > > to include !perf_event_is_tracing(), restoring the previous behavior and
> > > fixing bpftrace.
> > >
> > > Signed-off-by: Kyle Huey <khuey@xxxxxxxxxxxx>
> > > Reported-by: Joe Damato <jdamato@xxxxxxxxxx>
> > > Fixes: f11f10bfa1ca ("perf/bpf: Call BPF handler directly, not through overflow machinery")
> > > Tested-by: Joe Damato <jdamato@xxxxxxxxxx> # bpftrace
> > > Tested-by: Kyle Huey <khuey@xxxxxxxxxxxx> # bpf overflow handlers
> > > ---
> > >  kernel/events/core.c | 11 ++++++++++-
> > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > index 8f908f077935..f0d7119585dc 100644
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -9666,6 +9666,8 @@ static inline void perf_event_free_bpf_handler(struct perf_event *event)
> > >   * Generic event overflow handling, sampling.
> > >   */
> > >
> > > +static bool perf_event_is_tracing(struct perf_event *event);
> > > +
> > >  static int __perf_event_overflow(struct perf_event *event,
> > >                              int throttle, struct perf_sample_data *data,
> > >                              struct pt_regs *regs)
> > > @@ -9682,7 +9684,9 @@ static int __perf_event_overflow(struct perf_event *event,
> > >
> > >     ret = __perf_event_account_interrupt(event, throttle);
> > >
> > > -   if (event->prog && !bpf_overflow_handler(event, data, regs))
> > > +   if (event->prog &&
> > > +       !perf_event_is_tracing(event) &&
> > > +       !bpf_overflow_handler(event, data, regs))
> > >             return ret;
> >
> > ok makes sense, it's better to follow the perf_event_set_bpf_prog condition
> >
> > Reviewed-by: Jiri Olsa <jolsa@xxxxxxxxxx>
>
> Urgh, so wth does event_is_tracing do with event->prog? And can't we
> clean this up?

Tracing events keep track of the bpf program in event->prog solely for
cleanup. The bpf programs are stored in and invoked from
event->tp_event->prog_array, but when the event is destroyed it needs
to know which bpf program to remove from that array.

> That whole perf_event_is_tracing() is a pretty gross function.
>
> Also, I think the default return value of bpf_overflow_handler() is
> wrong -- note how if !event->prog we won't call bpf_overflow_handler(),
> but if we do call it, but then have !event->prog on the re-read, we
> still return 0.

The synchronization model here isn't quite clear to me but I don't
think this matters in practice. Once event->prog is set the only
allowed change is for it to be cleared when the perf event is freed.
Anything else is refused by perf_event_set_bpf_handler() with EEXIST.
Can that free race with an overflow handler? I'm not sure, but even if
it can, dropping an overflow for an event that's being freed seems
fine to me. If it can't race then we could remove the condition on the
re-read entirely.

- Kyle