On Mon, Aug 26, 2024 at 01:57:52PM +0200, Oleg Nesterov wrote: > On 08/26, Jiri Olsa wrote: > > > > On Mon, Aug 26, 2024 at 12:40:18AM +0200, Oleg Nesterov wrote: > > > $ ./test & > > > $ bpftrace -p $! -e 'uprobe:./test:func { printf("%d\n", pid); }' > > > > > > I hope that the syntax of the 2nd command is correct... > > > > > > I _think_ that it will print 2 pids too. > > > > yes.. but with CLONE_VM both processes share 'mm' > > Yes sure, > > > so they are threads, > > Well this depends on definition ;) but the CLONE_VM child is not a sub-thread, > it has another TGID. See below. > > > and at least uprobe_multi filters by process [1] now.. ;-) > > OK, if you say that this behaviour is fine I won't argue, I simply do not know. > But see below. > > > > But "perf-record -p" works as expected. > > > > I wonder it's because there's the perf layer that schedules each > > uprobe event only when its process (PID1/2) is scheduled in and will > > receive events only from that cpu while the process is running on it > > Not sure I understand... The task which hits the breakpoint is always > current, it is always scheduled in. hum, I might be missing something, but ;-) assuming we have 2 tasks, each with perf uprobe event assigned in perf path there's uprobe_perf_func which calls the uprobe_perf_filter and if it returns true it then goes: uprobe_perf_func __uprobe_perf_func perf_trace_buf_submit perf_tp_event { hlist_for_each_entry_rcu(event, head, hlist_entry) { if (perf_tp_event_match(event, &data, regs)) { perf_swevent_event(event, count, &data, regs); } so it will store the event only to perf event which is added for the task that is currently scheduled in, so it's not stored to the other task event in comparison with uprobe_multi path, where uprobe_multi_link_filter will pass and then it goes: handler_chain uprobe_multi_link_handler bpf_prog_run - executes bpf program the bpf program will get executed for uprobe from both tasks > > The main purpose of uprobe_perf_func()->uprobe_perf_filter() is NOT that > we want to avoid __uprobe_perf_func() although this makes sense. > > The main purpose is that we want to remove the breakpoints in current->mm > when uprobe_perf_filter() returns false, that is why UPROBE_HANDLER_REMOVE. > IOW, the main purpose is not penalise user-space unnecessarily. > > IIUC (but I am not sure), perf-record -p will work "correctly" even if we > remove uprobe_perf_filter() altogether. IIRC the perf layer does its own > filtering but I forgot everything. I think that's what I tried to describe above > > And this makes me think that perhaps BPF can't rely on uprobe_perf_filter() > either, even we forget about ret-probes. > > > [1] 46ba0e49b642 bpf: fix multi-uprobe PID filtering logic > > Looks obviously wrong... get_pid_task(PIDTYPE_TGID) can return a zombie > leader with ->mm == NULL while other threads and thus the whole process > is still alive. > > And again, the changelog says "the intent for PID filtering it to filter by > *process*", but clone(CLONE_VM) creates another process, not a thread. > > So perhaps we need > > - if (link->task && current->mm != link->task->mm) > + if (link->task && !same_thread_group(current, link->task)) that seems correct, need to check > > in uprobe_prog_run() to make "filter by *process*" true, but this won't > fix the problem with link->task->mm == NULL in uprobe_multi_link_filter(). > > > Does bpftrace use bpf_uprobe_multi_link_attach/etc ? I guess not... > Then which userspace tool uses this code? ;) yes, it will trigger if you attach to multiple uprobes, like with your test example with: # bpftrace -p xxx -e 'uprobe:./ex:func* { printf("%d\n", pid); }' thanks, jirka