On Tue, Aug 27, 2024 at 12:08:39PM +0200, Jiri Olsa wrote: > On Mon, Aug 26, 2024 at 01:57:52PM +0200, Oleg Nesterov wrote: > > On 08/26, Jiri Olsa wrote: > > > > > > On Mon, Aug 26, 2024 at 12:40:18AM +0200, Oleg Nesterov wrote: > > > > $ ./test & > > > > $ bpftrace -p $! -e 'uprobe:./test:func { printf("%d\n", pid); }' > > > > > > > > I hope that the syntax of the 2nd command is correct... > > > > > > > > I _think_ that it will print 2 pids too. > > > > > > yes.. but with CLONE_VM both processes share 'mm' > > > > Yes sure, > > > > > so they are threads, > > > > Well this depends on definition ;) but the CLONE_VM child is not a sub-thread, > > it has another TGID. See below. > > > > > and at least uprobe_multi filters by process [1] now.. ;-) > > > > OK, if you say that this behaviour is fine I won't argue, I simply do not know. > > But see below. > > > > > > But "perf-record -p" works as expected. > > > > > > I wonder it's because there's the perf layer that schedules each > > > uprobe event only when its process (PID1/2) is scheduled in and will > > > receive events only from that cpu while the process is running on it > > > > Not sure I understand... The task which hits the breakpoint is always > > current, it is always scheduled in. > > > > The main purpose of uprobe_perf_func()->uprobe_perf_filter() is NOT that > > we want to avoid __uprobe_perf_func() although this makes sense. > > > > The main purpose is that we want to remove the breakpoints in current->mm > > when uprobe_perf_filter() returns false, that is why UPROBE_HANDLER_REMOVE. > > IOW, the main purpose is not penalise user-space unnecessarily. > > > > IIUC (but I am not sure), perf-record -p will work "correctly" even if we > > remove uprobe_perf_filter() altogether. IIRC the perf layer does its own > > filtering but I forgot everything. > > > > And this makes me think that perhaps BPF can't rely on uprobe_perf_filter() > > either, even we forget about ret-probes. > > > > > [1] 46ba0e49b642 bpf: fix multi-uprobe PID filtering logic > > > > Looks obviously wrong... get_pid_task(PIDTYPE_TGID) can return a zombie > > leader with ->mm == NULL while other threads and thus the whole process > > is still alive. > > > > And again, the changelog says "the intent for PID filtering it to filter by > > *process*", but clone(CLONE_VM) creates another process, not a thread. > > > > So perhaps we need > > > > - if (link->task && current->mm != link->task->mm) > > + if (link->task && !same_thread_group(current, link->task)) > > > > in uprobe_prog_run() to make "filter by *process*" true, but this won't > > fix the problem with link->task->mm == NULL in uprobe_multi_link_filter(). > > would the same_thread_group(current, link->task) work in such case? > (zombie leader with other alive threads) should uprobe_perf_filter use same_thread_group as well instead of mm pointers check? jirka