On Fri, Aug 30, 2024 at 6:34 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > On Fri, Aug 30, 2024 at 12:12:09PM +0200, Oleg Nesterov wrote: > > The whole discussion was very confusing (yes, I too contributed to the > > confusion ;), let me try to summarise. > > > > > U(ret)probes are designed to be filterable using the PID, which is the > > > second parameter in the perf_event_open syscall. Currently, uprobe works > > > well with the filtering, but uretprobe is not affected by it. > > > > And this is correct. But the CONFIG_BPF_EVENTS code in __uprobe_perf_func() > > misunderstands the purpose of uprobe_perf_filter(). > > > > Lets forget about BPF for the moment. It is not that uprobe_perf_filter() > > does the filtering by the PID, it doesn't. We can simply kill this function > > and perf will work correctly. The perf layer in __uprobe_perf_func() does > > the filtering when perf_event->hw.target != NULL. > > > > So why does uprobe_perf_filter() call uprobe_perf_filter()? Not to avoid > > the __uprobe_perf_func() call (as the BPF code assumes), but to trigger > > unapply_uprobe() in handler_chain(). > > > > Suppose you do, say, > > > > $ perf probe -x /path/to/libc some_hot_function > > or > > $ perf probe -x /path/to/libc some_hot_function%return > > then > > $perf record -e ... -p 1 > > > > to trace the usage of some_hot_function() in the init process. Everything > > will work just fine if we kill uprobe_perf_filter()->uprobe_perf_filter(). > > > > But. If INIT forks a child C, dup_mm() will copy int3 installed by perf. > > So the child C will hit this breakpoint and cal handle_swbp/etc for no > > reason every time it calls some_hot_function(), not good. > > > > That is why uprobe_perf_func() calls uprobe_perf_filter() which returns > > UPROBE_HANDLER_REMOVE when C hits the breakpoint. handler_chain() will > > call unapply_uprobe() which will remove this breakpoint from C->mm. > > thanks for the info, I wasn't aware this was the intention > > uprobe_multi does not have perf event mechanism/check, so it's using > the filter function to do the process filtering.. which is not working > properly as you pointed out earlier So this part I don't completely get. I get that using task->mm comparison is wrong due to CLONE_VM, but why same_thread_group() check is wrong? I.e., why task->signal comparison is wrong? Sorry, I tried to follow the thread, but there were a lot of emails with a bunch of confusion, so I'm not sure I got all the details. > > > > > > We found that the filter function was not invoked when uretprobe was > > > initially implemented, and this has been existing for ten years. > > > > See above, this is correct. > > > > Note also that if you only use perf-probe + perf-record, no matter how > > many instances, you can even add BUG_ON(!uprobe_perf_filter(...)) into > > uretprobe_perf_func(). IIRC, perf doesn't use create_local_trace_uprobe(). > > > > --------------------------------------------------------------------------- > > Now lets return to BPF and this particular problem. I won't really argue > > with this patch, but > > > > - Please change the subject and update the changelog, > > "Fixes: c1ae5c75e103" and the whole reasoning is misleading > > and wrong, IMO. > > > > - This patch won't fix all problems because uprobe_perf_filter() > > filters by mm, not by pid. The changelog/patch assumes that it > > is a "PID filter", but it is not. > > > > See https://lore.kernel.org/linux-trace-kernel/20240825224018.GD3906@xxxxxxxxxx/ > > If the traced process does clone(CLONE_VM), bpftrace will hit the > > similar problem, with uprobe or uretprobe. > > > > - So I still think that the "right" fix should change the > > bpf_prog_run_array_uprobe() paths somehow, but I know nothing > > about bpf. > > to follow the perf event filter properly, bpf_prog_run_array_uprobe should > be called in perf_tp_event after the perf_tp_event_match call, but that's > already under preempt_disable, so that's why it's called before that > > jirka