> On May 17, 2019, at 2:06 PM, Alexei Starovoitov <ast@xxxxxx> wrote: > > On 5/17/19 11:40 AM, Song Liu wrote: >> +Alexei, Daniel, and bpf >> >>> On May 17, 2019, at 2:10 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: >>> >>> On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote: >>>> Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe >>>> some other bfp functions) is now broken, or, strating an unwind >>>> directly inside a bpf program will end up strangely. It have following >>>> kernel message: >>> >>> Urgh, what is that bpf_get_stackid_tp() doing to get the regs? I can't >>> follow. >> >> I guess we need something like the following? (we should be able to >> optimize the PER_CPU stuff). >> >> Thanks, >> Song >> >> >> diff --git i/kernel/trace/bpf_trace.c w/kernel/trace/bpf_trace.c >> index f92d6ad5e080..c525149028a7 100644 >> --- i/kernel/trace/bpf_trace.c >> +++ w/kernel/trace/bpf_trace.c >> @@ -696,11 +696,13 @@ static const struct bpf_func_proto bpf_perf_event_output_proto_tp = { >> .arg5_type = ARG_CONST_SIZE_OR_ZERO, >> }; >> >> +static DEFINE_PER_CPU(struct pt_regs, bpf_stackid_tp_regs); >> BPF_CALL_3(bpf_get_stackid_tp, void *, tp_buff, struct bpf_map *, map, >> u64, flags) >> { >> - struct pt_regs *regs = *(struct pt_regs **)tp_buff; >> + struct pt_regs *regs = this_cpu_ptr(&bpf_stackid_tp_regs); >> >> + perf_fetch_caller_regs(regs); > > No. pt_regs is already passed in. It's the first argument. > If we call perf_fetch_caller_regs() again the stack trace will be wrong. > bpf prog should not see itself, interpreter or all the frames in between. Thanks Alexei! I get it now. In bpf_get_stackid_tp(), the pt_regs is get by dereferencing the first field of tp_buff: struct pt_regs *regs = *(struct pt_regs **)tp_buff; tp_buff points to something like struct sched_switch_args { unsigned long long pad; char prev_comm[16]; int prev_pid; int prev_prio; long long prev_state; char next_comm[16]; int next_pid; int next_prio; }; where the first field "pad" is a pointer to pt_regs. @Kairui, I think you confirmed that current code will give empty call trace with ORC unwinder? If that's the case, can we add regs->ip back? (as in the first email of this thread. Thanks, Song