On Fri, Jun 26, 2020 at 3:45 PM Song Liu <songliubraving@xxxxxx> wrote: > > > > > On Jun 26, 2020, at 1:17 PM, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > > > On Thu, Jun 25, 2020 at 5:14 PM Song Liu <songliubraving@xxxxxx> wrote: > >> > >> Introduce helper bpf_get_task_stack(), which dumps stack trace of given > >> task. This is different to bpf_get_stack(), which gets stack track of > >> current task. One potential use case of bpf_get_task_stack() is to call > >> it from bpf_iter__task and dump all /proc/<pid>/stack to a seq_file. > >> > >> bpf_get_task_stack() uses stack_trace_save_tsk() instead of > >> get_perf_callchain() for kernel stack. The benefit of this choice is that > >> stack_trace_save_tsk() doesn't require changes in arch/. The downside of > >> using stack_trace_save_tsk() is that stack_trace_save_tsk() dumps the > >> stack trace to unsigned long array. For 32-bit systems, we need to > >> translate it to u64 array. > >> > >> Signed-off-by: Song Liu <songliubraving@xxxxxx> > >> --- > > > > Looks great, I just think that there are cases where user doesn't > > necessarily has valid task_struct pointer, just pid, so would be nice > > to not artificially restrict such cases by having extra helper. > > > > Acked-by: Andrii Nakryiko <andriin@xxxxxx> > > Thanks! > > > > >> include/linux/bpf.h | 1 + > >> include/uapi/linux/bpf.h | 35 ++++++++++++++- > >> kernel/bpf/stackmap.c | 79 ++++++++++++++++++++++++++++++++-- > >> kernel/trace/bpf_trace.c | 2 + > >> scripts/bpf_helpers_doc.py | 2 + > >> tools/include/uapi/linux/bpf.h | 35 ++++++++++++++- > >> 6 files changed, 149 insertions(+), 5 deletions(-) > >> > > > > [...] > > > >> + /* stack_trace_save_tsk() works on unsigned long array, while > >> + * perf_callchain_entry uses u64 array. For 32-bit systems, it is > >> + * necessary to fix this mismatch. > >> + */ > >> + if (__BITS_PER_LONG != 64) { > >> + unsigned long *from = (unsigned long *) entry->ip; > >> + u64 *to = entry->ip; > >> + int i; > >> + > >> + /* copy data from the end to avoid using extra buffer */ > >> + for (i = entry->nr - 1; i >= (int)init_nr; i--) > >> + to[i] = (u64)(from[i]); > > > > doing this forward would be just fine as well, no? First iteration > > will cast and overwrite low 32-bits, all the subsequent iterations > > won't even overlap. > > I think first iteration will write zeros to higher 32 bits, no? Oh, wait, I completely misread what this is doing. It up-converts from 32-bit to 64-bit, sorry. Yeah, ignore me on this :) But then I have another question. How do you know that entry->ip has enough space to keep the same number of 2x bigger entries? > > > > >> + } > >> + > >> +exit_put: > >> + put_callchain_entry(rctx); > >> + > >> + return entry; > >> +} > >> + > > > > [...] > > > >> +BPF_CALL_4(bpf_get_task_stack, struct task_struct *, task, void *, buf, > >> + u32, size, u64, flags) > >> +{ > >> + struct pt_regs *regs = task_pt_regs(task); > >> + > >> + return __bpf_get_stack(regs, task, buf, size, flags); > >> +} > > > > > > So this takes advantage of BTF and having a direct task_struct > > pointer. But for kprobes/tracepoint I think it would also be extremely > > helpful to be able to request stack trace by PID. How about one more > > helper which will wrap this one with get/put task by PID, e.g., > > bpf_get_pid_stack(int pid, void *buf, u32 size, u64 flags)? Would that > > be a problem? > > That should work. Let me add that in a follow up patch. >