On Fri, Jun 26, 2020 at 4:47 PM Song Liu <songliubraving@xxxxxx> wrote: > > > > > On Jun 26, 2020, at 3:51 PM, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > > > On Fri, Jun 26, 2020 at 3:45 PM Song Liu <songliubraving@xxxxxx> wrote: > >> > >> > >> > >>> On Jun 26, 2020, at 1:17 PM, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > >>> > >>> On Thu, Jun 25, 2020 at 5:14 PM Song Liu <songliubraving@xxxxxx> wrote: > >>>> > >>>> Introduce helper bpf_get_task_stack(), which dumps stack trace of given > >>>> task. This is different to bpf_get_stack(), which gets stack track of > >>>> current task. One potential use case of bpf_get_task_stack() is to call > >>>> it from bpf_iter__task and dump all /proc/<pid>/stack to a seq_file. > >>>> > >>>> bpf_get_task_stack() uses stack_trace_save_tsk() instead of > >>>> get_perf_callchain() for kernel stack. The benefit of this choice is that > >>>> stack_trace_save_tsk() doesn't require changes in arch/. The downside of > >>>> using stack_trace_save_tsk() is that stack_trace_save_tsk() dumps the > >>>> stack trace to unsigned long array. For 32-bit systems, we need to > >>>> translate it to u64 array. > >>>> > >>>> Signed-off-by: Song Liu <songliubraving@xxxxxx> > >>>> --- > >>> > >>> Looks great, I just think that there are cases where user doesn't > >>> necessarily has valid task_struct pointer, just pid, so would be nice > >>> to not artificially restrict such cases by having extra helper. > >>> > >>> Acked-by: Andrii Nakryiko <andriin@xxxxxx> > >> > >> Thanks! > >> > >>> > >>>> include/linux/bpf.h | 1 + > >>>> include/uapi/linux/bpf.h | 35 ++++++++++++++- > >>>> kernel/bpf/stackmap.c | 79 ++++++++++++++++++++++++++++++++-- > >>>> kernel/trace/bpf_trace.c | 2 + > >>>> scripts/bpf_helpers_doc.py | 2 + > >>>> tools/include/uapi/linux/bpf.h | 35 ++++++++++++++- > >>>> 6 files changed, 149 insertions(+), 5 deletions(-) > >>>> > >>> > >>> [...] > >>> > >>>> + /* stack_trace_save_tsk() works on unsigned long array, while > >>>> + * perf_callchain_entry uses u64 array. For 32-bit systems, it is > >>>> + * necessary to fix this mismatch. > >>>> + */ > >>>> + if (__BITS_PER_LONG != 64) { > >>>> + unsigned long *from = (unsigned long *) entry->ip; > >>>> + u64 *to = entry->ip; > >>>> + int i; > >>>> + > >>>> + /* copy data from the end to avoid using extra buffer */ > >>>> + for (i = entry->nr - 1; i >= (int)init_nr; i--) > >>>> + to[i] = (u64)(from[i]); > >>> > >>> doing this forward would be just fine as well, no? First iteration > >>> will cast and overwrite low 32-bits, all the subsequent iterations > >>> won't even overlap. > >> > >> I think first iteration will write zeros to higher 32 bits, no? > > > > Oh, wait, I completely misread what this is doing. It up-converts from > > 32-bit to 64-bit, sorry. Yeah, ignore me on this :) > > > > But then I have another question. How do you know that entry->ip has > > enough space to keep the same number of 2x bigger entries? > > The buffer is sized for sysctl_perf_event_max_stack u64 numbers. > stack_trace_save_tsk() will put at most stack_trace_save_tsk unsigned > long in it (init_nr == 0). So the buffer is big enough. > Awesome, thanks for clarification! > Thanks, > Song