Re: [PATCH v2] tracing/uprobe: Add missing PID filter for uretprobe

Jiri Olsa <olsajiri@xxxxxxxxx> · Fri, 30 Aug 2024 15:34:06 +0200

On Fri, Aug 30, 2024 at 12:12:09PM +0200, Oleg Nesterov wrote:
> The whole discussion was very confusing (yes, I too contributed to the
> confusion ;), let me try to summarise.
> 
> > U(ret)probes are designed to be filterable using the PID, which is the
> > second parameter in the perf_event_open syscall. Currently, uprobe works
> > well with the filtering, but uretprobe is not affected by it.
> 
> And this is correct. But the CONFIG_BPF_EVENTS code in __uprobe_perf_func()
> misunderstands the purpose of uprobe_perf_filter().
> 
> Lets forget about BPF for the moment. It is not that uprobe_perf_filter()
> does the filtering by the PID, it doesn't. We can simply kill this function
> and perf will work correctly. The perf layer in __uprobe_perf_func() does
> the filtering when perf_event->hw.target != NULL.
> 
> So why does uprobe_perf_filter() call uprobe_perf_filter()? Not to avoid
> the __uprobe_perf_func() call (as the BPF code assumes), but to trigger
> unapply_uprobe() in handler_chain().
> 
> Suppose you do, say,
> 
> 	$ perf probe -x /path/to/libc some_hot_function
> or
> 	$ perf probe -x /path/to/libc some_hot_function%return
> then
> 	$perf record -e ... -p 1
> 
> to trace the usage of some_hot_function() in the init process. Everything
> will work just fine if we kill uprobe_perf_filter()->uprobe_perf_filter().
> 
> But. If INIT forks a child C, dup_mm() will copy int3 installed by perf.
> So the child C will hit this breakpoint and cal handle_swbp/etc for no
> reason every time it calls some_hot_function(), not good.
> 
> That is why uprobe_perf_func() calls uprobe_perf_filter() which returns
> UPROBE_HANDLER_REMOVE when C hits the breakpoint. handler_chain() will
> call unapply_uprobe() which will remove this breakpoint from C->mm.

thanks for the info, I wasn't aware this was the intention

uprobe_multi does not have perf event mechanism/check, so it's using
the filter function to do the process filtering.. which is not working
properly as you pointed out earlier

> 
> > We found that the filter function was not invoked when uretprobe was
> > initially implemented, and this has been existing for ten years.
> 
> See above, this is correct.
> 
> Note also that if you only use perf-probe + perf-record, no matter how
> many instances, you can even add BUG_ON(!uprobe_perf_filter(...)) into
> uretprobe_perf_func(). IIRC, perf doesn't use create_local_trace_uprobe().
> 
> ---------------------------------------------------------------------------
> Now lets return to BPF and this particular problem. I won't really argue
> with this patch, but
> 
> 	- Please change the subject and update the changelog,
> 	  "Fixes: c1ae5c75e103" and the whole reasoning is misleading
> 	  and wrong, IMO.
> 
> 	- This patch won't fix all problems because uprobe_perf_filter()
> 	  filters by mm, not by pid. The changelog/patch assumes that it
> 	  is a "PID filter", but it is not.
> 
> 	  See https://lore.kernel.org/linux-trace-kernel/20240825224018.GD3906@xxxxxxxxxx/
> 	  If the traced process does clone(CLONE_VM), bpftrace will hit the
> 	  similar problem, with uprobe or uretprobe.
> 
> 	- So I still think that the "right" fix should change the
> 	  bpf_prog_run_array_uprobe() paths somehow, but I know nothing
> 	  about bpf.

to follow the perf event filter properly, bpf_prog_run_array_uprobe should
be called in perf_tp_event after the perf_tp_event_match call, but that's
already under preempt_disable, so that's why it's called before that

jirka