On Wed, Aug 7, 2024 at 2:44 PM Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote: > > On Tue, 6 Aug 2024 10:26:25 -0700 > Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > > On Tue, Aug 6, 2024 at 12:37 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > > > > > On Mon, Aug 05, 2024 at 01:28:03PM -0700, Andrii Nakryiko wrote: > > > > trace_uprobe->nhit counter is not incremented atomically, so its value > > > > is bogus in practice. On the other hand, it's actually a pretty big > > > > uprobe scalability problem due to heavy cache line bouncing between CPUs > > > > triggering the same uprobe. > > > > > > so you're seeing that in the benchmark, right? I'm curious how bad > > > the numbers are > > > > > > > Yes. So, once we get rid of all the uprobe/uretprobe/mm locks (ongoing > > work), this one was the last limiter to linear scalability. > > > > With this counter, I was topping out at about 12 mln/s uprobe > > triggering (I think it was 32 CPUs, but I don't remember exactly now). > > About 30% of CPU cycles were spent in this increment. > > > > But those 30% don't paint the full picture. Once the counter is > > removed, the same uprobe throughput jumps to 62 mln/s or so. So we > > definitely have to do something about it. > > > > > > > > > > Drop it and emit obviously unrealistic value in its stead in > > > > uporbe_profiler seq file. > > > > > > > > The alternative would be allocating per-CPU counter, but I'm not sure > > > > it's justified. > > > > > > > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > > > > --- > > > > kernel/trace/trace_uprobe.c | 4 +--- > > > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > > > > > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c > > > > index 52e76a73fa7c..5d38207db479 100644 > > > > --- a/kernel/trace/trace_uprobe.c > > > > +++ b/kernel/trace/trace_uprobe.c > > > > @@ -62,7 +62,6 @@ struct trace_uprobe { > > > > struct uprobe *uprobe; > > > > unsigned long offset; > > > > unsigned long ref_ctr_offset; > > > > - unsigned long nhit; > > > > struct trace_probe tp; > > > > }; > > > > > > > > @@ -821,7 +820,7 @@ static int probes_profile_seq_show(struct seq_file *m, void *v) > > > > > > > > tu = to_trace_uprobe(ev); > > > > seq_printf(m, " %s %-44s %15lu\n", tu->filename, > > > > - trace_probe_name(&tu->tp), tu->nhit); > > > > + trace_probe_name(&tu->tp), ULONG_MAX); > > > > > > seems harsh.. would it be that bad to create per cpu counter for that? > > > > Well, consider this patch a conversation starter. There are two > > reasons why I'm removing the counter instead of doing per-CPU one: > > > > - it's less work to send out a patch pointing out the problem (but > > the solution might change) > > - this counter was never correct in the presence of multiple > > threads, so I'm not sure how useful it is. > > > > Yes, I think we can do per-CPU counters, but do we want to pay the > > memory price? That's what I want to get from Masami, Steven, or Peter > > (whoever cares enough). > > I would like to make it per-cpu counter *and* make it kconfig optional. > Or just remove with the file (but it changes the user interface without > option). > > For the kprobes, the profile file is useful because it shows "missed" > counter. This tells user whether your trace data drops some events or not. > But if uprobes profile only shows the number of hit, we can use the > histogram trigger if needed. > I really don't like extra Kconfig options for something like this. So I'll just go ahead and switch this to per-CPU counters unconditionally. For BPF, this is unnecessary memory overhead, but as we shift to multi-uprobe (which doesn't go through trace_uprobe logic), it probably isn't a big deal over the longer term. > Thank you, > > > > > > > > > jirka > > > > > > > return 0; > > > > } > > > > > > > > @@ -1507,7 +1506,6 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs) > > > > int ret = 0; > > > > > > > > tu = container_of(con, struct trace_uprobe, consumer); > > > > - tu->nhit++; > > > > > > > > udd.tu = tu; > > > > udd.bp_addr = instruction_pointer(regs); > > > > -- > > > > 2.43.5 > > > > > > > -- > Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>