Re: [PATCH v2] tracing/uprobe: Add missing PID filter for uretprobe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 30, 2024 at 6:34 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
>
> On Fri, Aug 30, 2024 at 12:12:09PM +0200, Oleg Nesterov wrote:
> > The whole discussion was very confusing (yes, I too contributed to the
> > confusion ;), let me try to summarise.
> >
> > > U(ret)probes are designed to be filterable using the PID, which is the
> > > second parameter in the perf_event_open syscall. Currently, uprobe works
> > > well with the filtering, but uretprobe is not affected by it.
> >
> > And this is correct. But the CONFIG_BPF_EVENTS code in __uprobe_perf_func()
> > misunderstands the purpose of uprobe_perf_filter().
> >
> > Lets forget about BPF for the moment. It is not that uprobe_perf_filter()
> > does the filtering by the PID, it doesn't. We can simply kill this function
> > and perf will work correctly. The perf layer in __uprobe_perf_func() does
> > the filtering when perf_event->hw.target != NULL.
> >
> > So why does uprobe_perf_filter() call uprobe_perf_filter()? Not to avoid
> > the __uprobe_perf_func() call (as the BPF code assumes), but to trigger
> > unapply_uprobe() in handler_chain().
> >
> > Suppose you do, say,
> >
> >       $ perf probe -x /path/to/libc some_hot_function
> > or
> >       $ perf probe -x /path/to/libc some_hot_function%return
> > then
> >       $perf record -e ... -p 1
> >
> > to trace the usage of some_hot_function() in the init process. Everything
> > will work just fine if we kill uprobe_perf_filter()->uprobe_perf_filter().
> >
> > But. If INIT forks a child C, dup_mm() will copy int3 installed by perf.
> > So the child C will hit this breakpoint and cal handle_swbp/etc for no
> > reason every time it calls some_hot_function(), not good.
> >
> > That is why uprobe_perf_func() calls uprobe_perf_filter() which returns
> > UPROBE_HANDLER_REMOVE when C hits the breakpoint. handler_chain() will
> > call unapply_uprobe() which will remove this breakpoint from C->mm.
>
> thanks for the info, I wasn't aware this was the intention
>
> uprobe_multi does not have perf event mechanism/check, so it's using
> the filter function to do the process filtering.. which is not working
> properly as you pointed out earlier

So this part I don't completely get. I get that using task->mm
comparison is wrong due to CLONE_VM, but why same_thread_group() check
is wrong? I.e., why task->signal comparison is wrong?

Sorry, I tried to follow the thread, but there were a lot of emails
with a bunch of confusion, so I'm not sure I got all the details.

>
> >
> > > We found that the filter function was not invoked when uretprobe was
> > > initially implemented, and this has been existing for ten years.
> >
> > See above, this is correct.
> >
> > Note also that if you only use perf-probe + perf-record, no matter how
> > many instances, you can even add BUG_ON(!uprobe_perf_filter(...)) into
> > uretprobe_perf_func(). IIRC, perf doesn't use create_local_trace_uprobe().
> >
> > ---------------------------------------------------------------------------
> > Now lets return to BPF and this particular problem. I won't really argue
> > with this patch, but
> >
> >       - Please change the subject and update the changelog,
> >         "Fixes: c1ae5c75e103" and the whole reasoning is misleading
> >         and wrong, IMO.
> >
> >       - This patch won't fix all problems because uprobe_perf_filter()
> >         filters by mm, not by pid. The changelog/patch assumes that it
> >         is a "PID filter", but it is not.
> >
> >         See https://lore.kernel.org/linux-trace-kernel/20240825224018.GD3906@xxxxxxxxxx/
> >         If the traced process does clone(CLONE_VM), bpftrace will hit the
> >         similar problem, with uprobe or uretprobe.
> >
> >       - So I still think that the "right" fix should change the
> >         bpf_prog_run_array_uprobe() paths somehow, but I know nothing
> >         about bpf.
>
> to follow the perf event filter properly, bpf_prog_run_array_uprobe should
> be called in perf_tp_event after the perf_tp_event_match call, but that's
> already under preempt_disable, so that's why it's called before that
>
> jirka





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux