Re: [PATCH v2 bpf-next 2/4] bpf: introduce helper bpf_get_task_stak()

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 26 Jun 2020 13:17:41 -0700

On Thu, Jun 25, 2020 at 5:14 PM Song Liu <songliubraving@xxxxxx> wrote:
>
> Introduce helper bpf_get_task_stack(), which dumps stack trace of given
> task. This is different to bpf_get_stack(), which gets stack track of
> current task. One potential use case of bpf_get_task_stack() is to call
> it from bpf_iter__task and dump all /proc/<pid>/stack to a seq_file.
>
> bpf_get_task_stack() uses stack_trace_save_tsk() instead of
> get_perf_callchain() for kernel stack. The benefit of this choice is that
> stack_trace_save_tsk() doesn't require changes in arch/. The downside of
> using stack_trace_save_tsk() is that stack_trace_save_tsk() dumps the
> stack trace to unsigned long array. For 32-bit systems, we need to
> translate it to u64 array.
>
> Signed-off-by: Song Liu <songliubraving@xxxxxx>
> ---

Looks great, I just think that there are cases where user doesn't
necessarily has valid task_struct pointer, just pid, so would be nice
to not artificially restrict such cases by having extra helper.

Acked-by: Andrii Nakryiko <andriin@xxxxxx>

>  include/linux/bpf.h            |  1 +
>  include/uapi/linux/bpf.h       | 35 ++++++++++++++-
>  kernel/bpf/stackmap.c          | 79 ++++++++++++++++++++++++++++++++--
>  kernel/trace/bpf_trace.c       |  2 +
>  scripts/bpf_helpers_doc.py     |  2 +
>  tools/include/uapi/linux/bpf.h | 35 ++++++++++++++-
>  6 files changed, 149 insertions(+), 5 deletions(-)
>

[...]

> +       /* stack_trace_save_tsk() works on unsigned long array, while
> +        * perf_callchain_entry uses u64 array. For 32-bit systems, it is
> +        * necessary to fix this mismatch.
> +        */
> +       if (__BITS_PER_LONG != 64) {
> +               unsigned long *from = (unsigned long *) entry->ip;
> +               u64 *to = entry->ip;
> +               int i;
> +
> +               /* copy data from the end to avoid using extra buffer */
> +               for (i = entry->nr - 1; i >= (int)init_nr; i--)
> +                       to[i] = (u64)(from[i]);

doing this forward would be just fine as well, no? First iteration
will cast and overwrite low 32-bits, all the subsequent iterations
won't even overlap.

> +       }
> +
> +exit_put:
> +       put_callchain_entry(rctx);
> +
> +       return entry;
> +}
> +

[...]

> +BPF_CALL_4(bpf_get_task_stack, struct task_struct *, task, void *, buf,
> +          u32, size, u64, flags)
> +{
> +       struct pt_regs *regs = task_pt_regs(task);
> +
> +       return __bpf_get_stack(regs, task, buf, size, flags);
> +}

So this takes advantage of BTF and having a direct task_struct
pointer. But for kprobes/tracepoint I think it would also be extremely
helpful to be able to request stack trace by PID. How about one more
helper which will wrap this one with get/put task by PID, e.g.,
bpf_get_pid_stack(int pid, void *buf, u32 size, u64 flags)? Would that
be a problem?

> +
> +static int bpf_get_task_stack_btf_ids[5];
> +const struct bpf_func_proto bpf_get_task_stack_proto = {
> +       .func           = bpf_get_task_stack,
> +       .gpl_only       = false,
> +       .ret_type       = RET_INTEGER,
> +       .arg1_type      = ARG_PTR_TO_BTF_ID,
> +       .arg2_type      = ARG_PTR_TO_UNINIT_MEM,
> +       .arg3_type      = ARG_CONST_SIZE_OR_ZERO,
> +       .arg4_type      = ARG_ANYTHING,
> +       .btf_id         = bpf_get_task_stack_btf_ids,
> +};
> +

[...]