On Thu, Oct 24, 2024 at 6:48 AM Xu Kuohai <xukuohai@xxxxxxxxxxxxxxx> wrote: > > On 10/23/2024 11:16 AM, Xu Kuohai wrote: > > On 10/23/2024 7:37 AM, Puranjay Mohan wrote: > >> On Wed, Oct 23, 2024 at 12:50 AM Alexei Starovoitov > >> <alexei.starovoitov@xxxxxxxxx> wrote: > >>> > >>> On Sat, Oct 19, 2024 at 2:15 AM Xu Kuohai <xukuohai@xxxxxxxxxxxxxxx> wrote: > >>>> > >>>> From: Xu Kuohai <xukuohai@xxxxxxxxxx> > >>>> > >>>> The callsite layout for arm64 fentry is: > >>>> > >>>> mov x9, lr > >>>> nop > >>>> > >>>> When a bpf prog is attached, the nop instruction is patched to a call > >>>> to bpf trampoline: > >>>> > >>>> mov x9, lr > >>>> bl <bpf trampoline> > >>>> > >>>> This passes two return addresses to bpf trampoline: the return address > >>>> for the traced function/prog, stored in x9, and the return address for > >>>> the bpf trampoline, stored in lr. To ensure stacktrace works properly, > >>>> the bpf trampoline constructs two fake function stack frames using x9 > >>>> and lr. > >>>> > >>>> However, struct_ops progs are used as function callbacks and are invoked > >>>> directly, without x9 being set as the fentry callsite does. Therefore, > >>>> only one stack frame should be constructed using lr for struct_ops. > >>> > >>> Are you saying that currently stack unwinder on arm64 is > >>> completely broken for struct_ops progs ? > >>> or it shows an extra frame that doesn't have to be shown ? > >>> > >>> If former then it's certainly a bpf tree material. > >>> If latter then bpf-next will do. > >>> Pls clarify. > >> > >> It is not completely broken, only an extra garbage frame is shown > >> between the caller of the trampoline and its caller. > >> > > > > Yep, exactly. Here is a perf script sample, where tcp_ack+0x404 > > is the garbage frame. > > > > ffffffc0801a04b4 bpf_prog_50992e55a0f655a9_bpf_cubic_cong_avoid+0x98 (bpf_prog_50992e55a0f655a9_bpf_cubic_cong_avoid) > > ffffffc0801a228c [unknown] ([kernel.kallsyms]) // bpf trampoline > > ffffffd08d362590 tcp_ack+0x798 ([kernel.kallsyms]) // caller for bpf trampoline > > ffffffd08d3621fc tcp_ack+0x404 ([kernel.kallsyms]) // garbage frame > > ffffffd08d36452c tcp_rcv_established+0x4ac ([kernel.kallsyms]) > > ffffffd08d375c58 tcp_v4_do_rcv+0x1f0 ([kernel.kallsyms]) > > ffffffd08d378630 tcp_v4_rcv+0xeb8 ([kernel.kallsyms]) > > ... > > > > And this sample also shows that there is no symbol for the > > struct_ops bpf trampoline. Maybe we should add symbol for it? > > > > Emm, stack unwinder on x86 is completely broken for struct_ops > progs. > > It's because the following function returns 0 for a struct_ops > bpf trampoline address as there is no corresponding kernel symbol, > which causes the address not to be recognized as kerneltext. As > a result, the winder stops on ip == 0. > > unsigned long unwind_get_return_address(struct unwind_state *state) > { > if (unwind_done(state)) > return 0; > > return __kernel_text_address(state->ip) ? state->ip : 0; > } > > Here is an example of broken stack trace from perf sampling, where > only one stack frame is captured: > > ffffffffc000cfb4 bpf_prog_e60d93d3ec88d5ef_bpf_cubic_cong_avoid+0x78 (bpf_prog_e60d93d3ec88d5ef_bpf_cubic_cong_avoid) > (no more frames) you mean arch_stack_walk() won't see anything after ip=0, but dump_stack() will still print the rest with "?" (as unreliable). That's bad indeed. > To fix it, I think kernel symbol should be added for struct_ops > trampoline. Makes sense. Pls send a patch. As far as this patch please add an earlier example of double tcp_ack trace to commit log and resubmit targeting bpf-next.