Re: [PATCH v4 3/5] perf/core: Account dropped samples from BPF

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 23 Oct 2024 14:24:13 -0700

On Wed, Oct 23, 2024 at 1:32 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
>
> On Wed, Oct 23, 2024 at 12:13:31PM -0700, Andrii Nakryiko wrote:
> > On Wed, Oct 23, 2024 at 11:47 AM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> > >
> > > Hello,
> > >
> > > On Wed, Oct 23, 2024 at 09:12:52AM -0700, Andrii Nakryiko wrote:
> > > > On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> > > > >
> > > > > Like in the software events, the BPF overflow handler can drop samples
> > > > > by returning 0.  Let's count the dropped samples here too.
> > > > >
> > > > > Acked-by: Kyle Huey <me@xxxxxxxxxxxx>
> > > > > Cc: Alexei Starovoitov <ast@xxxxxxxxxx>
> > > > > Cc: Andrii Nakryiko <andrii@xxxxxxxxxx>
> > > > > Cc: Song Liu <song@xxxxxxxxxx>
> > > > > Cc: bpf@xxxxxxxxxxxxxxx
> > > > > Signed-off-by: Namhyung Kim <namhyung@xxxxxxxxxx>
> > > > > ---
> > > > >  kernel/events/core.c | 4 +++-
> > > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > > > index 5d24597180dec167..b41c17a0bc19f7c2 100644
> > > > > --- a/kernel/events/core.c
> > > > > +++ b/kernel/events/core.c
> > > > > @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event,
> > > > >         ret = __perf_event_account_interrupt(event, throttle);
> > > > >
> > > > >         if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT &&
> > > > > -           !bpf_overflow_handler(event, data, regs))
> > > > > +           !bpf_overflow_handler(event, data, regs)) {
> > > > > +               atomic64_inc(&event->dropped_samples);
> > > >
> > > > I don't see the full patch set (please cc relevant people and mailing
> > > > list on each patch in the patch set), but do we really want to pay the
> > >
> > > Sorry, you can find the whole series here.
> > >
> > > https://lore.kernel.org/lkml/20241023000928.957077-1-namhyung@xxxxxxxxxx
> > >
> > > I thought it's mostly for the perf part so I didn't CC bpf folks but
> > > I'll do in the next version.
> > >
> > >
> > > > price of atomic increment on what's the very typical situation of a
> > > > BPF program returning 0?
> > >
> > > Is it typical for BPF_PROG_TYPE_PERF_EVENT?  I guess TRACING programs
> > > usually return 0 but PERF_EVENT should care about the return values.
> > >
> >
> > Yeah, it's pretty much always `return 0;` for perf_event-based BPF
> > profilers. It's rather unusual to return non-zero, actually.
>
> Ok, then it may be local_t or plain unsigned long.  It should be
> updated on overflow only.  Read can be racy but I think it's ok to
> miss some numbers.  If someone really needs a precise count, they can
> read it after disabling the event IMHO.
>
> What do you think?
>

See [0] for unsynchronized increment absolutely killing the
performance due to cache line bouncing between CPUs. If the event is
high-frequency and can be triggered across multiple CPUs in short
succession, even an imprecise counter might be harmful.

In general, I'd say that if BPF attachment is involved, we probably
should avoid maintaining unnecessary statistics. Things like this
event->dropped_samples increment can be done very efficiently and
trivially from inside the BPF program, if at all necessary.

Having said that, if it's unlikely to have perf_event bouncing between
multiple CPUs, it's probably not that big of a deal.

  [0] https://lore.kernel.org/linux-trace-kernel/20240813203409.3985398-1-andrii@xxxxxxxxxx/

> Thanks,
> Namhyung
>