Re: [PATCH 2/4] perf,uprobes: fix user stack traces in the presence of pending uretprobes

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 15 May 2024 08:32:30 -0600

On Wed, May 15, 2024 at 3:30 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Wed, May 08, 2024 at 02:26:03PM -0700, Andrii Nakryiko wrote:
>
> > +static void fixup_uretprobe_trampoline_entries(struct perf_callchain_entry *entry,
> > +                                            int start_entry_idx)
> > +{
> > +#ifdef CONFIG_UPROBES
> > +     struct uprobe_task *utask = current->utask;
> > +     struct return_instance *ri;
> > +     __u64 *cur_ip, *last_ip, tramp_addr;
> > +
> > +     if (likely(!utask || !utask->return_instances))
> > +             return;
> > +
> > +     cur_ip = &entry->ip[start_entry_idx];
> > +     last_ip = &entry->ip[entry->nr - 1];
> > +     ri = utask->return_instances;
> > +     tramp_addr = uprobe_get_trampoline_vaddr();
> > +
> > +     /* If there are pending uretprobes for current thread, they are
>
> Comment style fail. Also 'for *the* current thread'.
>

ack, will fix

> > +      * recorded in a list inside utask->return_instances; each such
> > +      * pending uretprobe replaces traced user function's return address on
> > +      * the stack, so when stack trace is captured, instead of seeing
> > +      * actual function's return address, we'll have one or many uretprobe
> > +      * trampoline addresses in the stack trace, which are not helpful and
> > +      * misleading to users.
>
> I would beg to differ, what if the uprobe is causing the performance
> issue?

If uprobe/uretprobe code itself is causing performance issues, you'll
see that in other stack traces, where this code will be actively
running on CPU. I don't think we make anything worse here.

Here we are talking about the case where the uprobe part is done and
it hijacked the return address on the stack, uretprobe is not yet
running (and so not causing any performance issues). The presence of
this "snooping" (pending) uretprobe is irrelevant to the user that is
capturing stack trace. Right now address in [uprobes] VMA section
installed by uretprobe infra code is directly replacing correct and
actual calling function address.

Worst case, one can argue that both [uprobes] and original caller
address should be in the stack trace, but I think it still will be
confusing to users. And also will make implementation less efficient
because now we'll need to insert entries into the array and shift
everything around.

So as I mentioned above, if the concern is seeing uprobe/uretprobe
code using CPU, that doesn't change, we'll see that in the overall set
of captured stack traces (be it custom uprobe handler code or BPF
program).

>
> While I do think it makes sense to fix the unwind in the sense that we
> should be able to continue the unwind, I don't think it makes sense to
> completely hide the presence of uprobes.

Unwind isn't broken in this sense, we do unwind the entire stack trace
(see examples in the later patch). We just don't capture actual
callers if they have uretprobe pending.

>
> > +      * So here we go over the pending list of uretprobes, and each
> > +      * encountered trampoline address is replaced with actual return
> > +      * address.
> > +      */
> > +     while (ri && cur_ip <= last_ip) {
> > +             if (*cur_ip == tramp_addr) {
> > +                     *cur_ip = ri->orig_ret_vaddr;
> > +                     ri = ri->next;
> > +             }
> > +             cur_ip++;
> > +     }
> > +#endif
> > +}