Re: [PATCH] uprobes: get rid of bogus trace_uprobe hit counter

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 9 Aug 2024 11:43:07 -0700

On Wed, Aug 7, 2024 at 2:44 PM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
>
> On Tue, 6 Aug 2024 10:26:25 -0700
> Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
>
> > On Tue, Aug 6, 2024 at 12:37 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> > >
> > > On Mon, Aug 05, 2024 at 01:28:03PM -0700, Andrii Nakryiko wrote:
> > > > trace_uprobe->nhit counter is not incremented atomically, so its value
> > > > is bogus in practice. On the other hand, it's actually a pretty big
> > > > uprobe scalability problem due to heavy cache line bouncing between CPUs
> > > > triggering the same uprobe.
> > >
> > > so you're seeing that in the benchmark, right? I'm curious how bad
> > > the numbers are
> > >
> >
> > Yes. So, once we get rid of all the uprobe/uretprobe/mm locks (ongoing
> > work), this one was the last limiter to linear scalability.
> >
> > With this counter, I was topping out at about 12 mln/s uprobe
> > triggering (I think it was 32 CPUs, but I don't remember exactly now).
> > About 30% of CPU cycles were spent in this increment.
> >
> > But those 30% don't paint the full picture. Once the counter is
> > removed, the same uprobe throughput jumps to 62 mln/s or so. So we
> > definitely have to do something about it.
> >
> > > >
> > > > Drop it and emit obviously unrealistic value in its stead in
> > > > uporbe_profiler seq file.
> > > >
> > > > The alternative would be allocating per-CPU counter, but I'm not sure
> > > > it's justified.
> > > >
> > > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> > > > ---
> > > >  kernel/trace/trace_uprobe.c | 4 +---
> > > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > > >
> > > > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > > > index 52e76a73fa7c..5d38207db479 100644
> > > > --- a/kernel/trace/trace_uprobe.c
> > > > +++ b/kernel/trace/trace_uprobe.c
> > > > @@ -62,7 +62,6 @@ struct trace_uprobe {
> > > >       struct uprobe                   *uprobe;
> > > >       unsigned long                   offset;
> > > >       unsigned long                   ref_ctr_offset;
> > > > -     unsigned long                   nhit;
> > > >       struct trace_probe              tp;
> > > >  };
> > > >
> > > > @@ -821,7 +820,7 @@ static int probes_profile_seq_show(struct seq_file *m, void *v)
> > > >
> > > >       tu = to_trace_uprobe(ev);
> > > >       seq_printf(m, "  %s %-44s %15lu\n", tu->filename,
> > > > -                     trace_probe_name(&tu->tp), tu->nhit);
> > > > +                trace_probe_name(&tu->tp), ULONG_MAX);
> > >
> > > seems harsh.. would it be that bad to create per cpu counter for that?
> >
> > Well, consider this patch a conversation starter. There are two
> > reasons why I'm removing the counter instead of doing per-CPU one:
> >
> >   - it's less work to send out a patch pointing out the problem (but
> > the solution might change)
> >   - this counter was never correct in the presence of multiple
> > threads, so I'm not sure how useful it is.
> >
> > Yes, I think we can do per-CPU counters, but do we want to pay the
> > memory price? That's what I want to get from Masami, Steven, or Peter
> > (whoever cares enough).
>
> I would like to make it per-cpu counter *and* make it kconfig optional.
> Or just remove with the file (but it changes the user interface without
> option).
>
> For the kprobes, the profile file is useful because it shows "missed"
> counter. This tells user whether your trace data drops some events or not.
> But if uprobes profile only shows the number of hit, we can use the
> histogram trigger if needed.
>

I really don't like extra Kconfig options for something like this. So
I'll just go ahead and switch this to per-CPU counters
unconditionally.

For BPF, this is unnecessary memory overhead, but as we shift to
multi-uprobe (which doesn't go through trace_uprobe logic), it
probably isn't a big deal over the longer term.

> Thank you,
>
> >
> > >
> > > jirka
> > >
> > > >       return 0;
> > > >  }
> > > >
> > > > @@ -1507,7 +1506,6 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
> > > >       int ret = 0;
> > > >
> > > >       tu = container_of(con, struct trace_uprobe, consumer);
> > > > -     tu->nhit++;
> > > >
> > > >       udd.tu = tu;
> > > >       udd.bp_addr = instruction_pointer(regs);
> > > > --
> > > > 2.43.5
> > > >
>
>
> --
> Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>