On Fri, Aug 30, 2024 at 08:51:12AM -0700, Andrii Nakryiko wrote: > On Fri, Aug 30, 2024 at 6:34 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > > > On Fri, Aug 30, 2024 at 12:12:09PM +0200, Oleg Nesterov wrote: > > > The whole discussion was very confusing (yes, I too contributed to the > > > confusion ;), let me try to summarise. > > > > > > > U(ret)probes are designed to be filterable using the PID, which is the > > > > second parameter in the perf_event_open syscall. Currently, uprobe works > > > > well with the filtering, but uretprobe is not affected by it. > > > > > > And this is correct. But the CONFIG_BPF_EVENTS code in __uprobe_perf_func() > > > misunderstands the purpose of uprobe_perf_filter(). > > > > > > Lets forget about BPF for the moment. It is not that uprobe_perf_filter() > > > does the filtering by the PID, it doesn't. We can simply kill this function > > > and perf will work correctly. The perf layer in __uprobe_perf_func() does > > > the filtering when perf_event->hw.target != NULL. > > > > > > So why does uprobe_perf_filter() call uprobe_perf_filter()? Not to avoid > > > the __uprobe_perf_func() call (as the BPF code assumes), but to trigger > > > unapply_uprobe() in handler_chain(). > > > > > > Suppose you do, say, > > > > > > $ perf probe -x /path/to/libc some_hot_function > > > or > > > $ perf probe -x /path/to/libc some_hot_function%return > > > then > > > $perf record -e ... -p 1 > > > > > > to trace the usage of some_hot_function() in the init process. Everything > > > will work just fine if we kill uprobe_perf_filter()->uprobe_perf_filter(). > > > > > > But. If INIT forks a child C, dup_mm() will copy int3 installed by perf. > > > So the child C will hit this breakpoint and cal handle_swbp/etc for no > > > reason every time it calls some_hot_function(), not good. > > > > > > That is why uprobe_perf_func() calls uprobe_perf_filter() which returns > > > UPROBE_HANDLER_REMOVE when C hits the breakpoint. handler_chain() will > > > call unapply_uprobe() which will remove this breakpoint from C->mm. > > > > thanks for the info, I wasn't aware this was the intention > > > > uprobe_multi does not have perf event mechanism/check, so it's using > > the filter function to do the process filtering.. which is not working > > properly as you pointed out earlier > > So this part I don't completely get. I get that using task->mm > comparison is wrong due to CLONE_VM, but why same_thread_group() check > is wrong? I.e., why task->signal comparison is wrong? the way I understand it is that we take the group leader task and store it in bpf_uprobe_multi_link::task but it can exit while the rest of the threads is still running so the uprobe_multi_link_filter won't match them (leader->mm is NULL) Oleg suggested change below (in addition to same_thread_group change) to take that in account jirka --- diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 98e395f1baae..9e6b390aa6da 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -3235,9 +3235,23 @@ uprobe_multi_link_filter(struct uprobe_consumer *con, enum uprobe_filter_ctx ctx struct mm_struct *mm) { struct bpf_uprobe *uprobe; + struct task_struct *task, *t; + bool ret = false; uprobe = container_of(con, struct bpf_uprobe, consumer); - return uprobe->link->task->mm == mm; + task = uprobe->link->task; + + rcu_read_lock(); + for_each_thread(task, t) { + struct mm_struct *mm = READ_ONCE(t->mm); + if (mm) { + ret = t->mm == mm; + break; + } + } + rcu_read_unlock(); + + return ret; } static int