> On May 19, 2019, at 11:07 AM, Kairui Song <kasong@xxxxxxxxxx> wrote: > > On Sat, May 18, 2019 at 5:48 AM Song Liu <songliubraving@xxxxxx> wrote: >> >> >> >>> On May 17, 2019, at 2:06 PM, Alexei Starovoitov <ast@xxxxxx> wrote: >>> >>> On 5/17/19 11:40 AM, Song Liu wrote: >>>> +Alexei, Daniel, and bpf >>>> >>>>> On May 17, 2019, at 2:10 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: >>>>> >>>>> On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote: >>>>>> Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe >>>>>> some other bfp functions) is now broken, or, strating an unwind >>>>>> directly inside a bpf program will end up strangely. It have following >>>>>> kernel message: >>>>> >>>>> Urgh, what is that bpf_get_stackid_tp() doing to get the regs? I can't >>>>> follow. >>>> >>>> I guess we need something like the following? (we should be able to >>>> optimize the PER_CPU stuff). >>>> >>>> Thanks, >>>> Song >>>> >>>> >>>> diff --git i/kernel/trace/bpf_trace.c w/kernel/trace/bpf_trace.c >>>> index f92d6ad5e080..c525149028a7 100644 >>>> --- i/kernel/trace/bpf_trace.c >>>> +++ w/kernel/trace/bpf_trace.c >>>> @@ -696,11 +696,13 @@ static const struct bpf_func_proto bpf_perf_event_output_proto_tp = { >>>> .arg5_type = ARG_CONST_SIZE_OR_ZERO, >>>> }; >>>> >>>> +static DEFINE_PER_CPU(struct pt_regs, bpf_stackid_tp_regs); >>>> BPF_CALL_3(bpf_get_stackid_tp, void *, tp_buff, struct bpf_map *, map, >>>> u64, flags) >>>> { >>>> - struct pt_regs *regs = *(struct pt_regs **)tp_buff; >>>> + struct pt_regs *regs = this_cpu_ptr(&bpf_stackid_tp_regs); >>>> >>>> + perf_fetch_caller_regs(regs); >>> >>> No. pt_regs is already passed in. It's the first argument. >>> If we call perf_fetch_caller_regs() again the stack trace will be wrong. >>> bpf prog should not see itself, interpreter or all the frames in between. >> >> Thanks Alexei! I get it now. >> >> In bpf_get_stackid_tp(), the pt_regs is get by dereferencing the first field >> of tp_buff: >> >> struct pt_regs *regs = *(struct pt_regs **)tp_buff; >> >> tp_buff points to something like >> >> struct sched_switch_args { >> unsigned long long pad; >> char prev_comm[16]; >> int prev_pid; >> int prev_prio; >> long long prev_state; >> char next_comm[16]; >> int next_pid; >> int next_prio; >> }; >> >> where the first field "pad" is a pointer to pt_regs. >> >> @Kairui, I think you confirmed that current code will give empty call trace >> with ORC unwinder? If that's the case, can we add regs->ip back? (as in the >> first email of this thread. >> >> Thanks, >> Song >> > > Hi thanks for the suggestion, yes we can add it should be good an idea > to always have IP when stack trace is not available. > But stack trace is actually still broken, it will always give only one > level of stacktrace (the IP). I think this is still the best fix/workaround here? And only one level of stack trace should be OK for tracepoint? Thanks, Song > > -- > Best Regards, > Kairui Song