On Tue, Oct 8, 2024 at 11:31 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > > On 10/8/24 7:06 PM, Alexei Starovoitov wrote: > > On Tue, Oct 8, 2024 at 3:10 PM Alexei Starovoitov > > <alexei.starovoitov@xxxxxxxxx> wrote: > >> We need to scrap this idea. > >> Let's go back to push/pop r11 around calls :( > > I didn't give up :) > > > > Here is a new idea that seems to work: > > > > [ 131.472066] dump_stack_lvl+0x53/0x70 > > [ 131.472066] bpf_task_storage_get+0x3e/0x2f0 > > [ 131.472066] ? bpf_task_storage_get+0x231/0x2f0 > > [ 131.472066] bpf_prog_ed7a5f33cc9fefab_foo+0x30/0x32 > > [ 131.472066] bpf_prog_8c4f9bc79da6c27e_socket_post_create+0x68/0x6d > > ... > > [ 131.417145] dump_stack_lvl+0x53/0x70 > > [ 131.417145] bpf_task_storage_get+0x3e/0x2f0 > > [ 131.417145] ? selinux_netlbl_socket_post_create+0xab/0x150 > > [ 131.417145] bpf_prog_8c4f9bc79da6c27e_socket_post_create+0x60/0x6d > > > > > > The stack dump works fine out of main prog and out of subprog. > > > > The key difference it to pretend to have stack_depth=0, > > so there is no adjustment to %rsp, > > but point %rbp to per-cpu private stack and grow it _up_. > > > > For the main prog %rbp points to the bottom of priv stack > > plus stack_depth it needs, > > so all bpf insns that do r10-off access the bottom of that priv stack. > > When subprog is called it does 'add %rbp, its_stack_depth' and > > in turn it's using memory above the bottom of the priv stack. > > > > That seems to work, but exceptions and tailcalls are broken. > > I ran out of time today to debug. > > Pls see the attached patch. > > The core part of the code is below: > > EMIT1(0x55); /* push rbp */ - EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp > */ + if (tail_call_reachable || !bpf_prog->aux->priv_stack_ptr) { + > EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */ + } else { + if > (!is_subprog) { + /* mov rsp, pcpu_priv_stack_bottom */ + void __percpu > *priv_frame_ptr = + bpf_prog->aux->priv_stack_ptr + > round_up(stack_depth, 8); + + /* movabs sp, priv_frame_ptr */ + > emit_mov_imm64(&prog, AUX_REG, (long) priv_frame_ptr >> 32, + (u32) > (long) priv_frame_ptr); + + /* add <aux_reg>, gs:[<off>] */ + > EMIT2(0x65, 0x4c); + EMIT3(0x03, 0x1c, 0x25); + EMIT((u32)(unsigned > long)&this_cpu_off, 4); + /* mov rbp, aux_reg */ + EMIT3(0x4c, 0x89, > 0xdd); + } else { + /* add rbp, stack_depth */ + EMIT3_off32(0x48, 0x81, > 0xC5, round_up(stack_depth, 8)); + } + } your mailer garbled the diff. > So for main program, we have > > push rbp rbp = per_cpu_ptr(priv_stack_ptr + stack_size) ... What will > happen we have an interrupt like below? push rbp rbp = > per_cpu_ptr(priv_stack_ptr + stack_size) <=== interrupt happens here ... > If we need to dump the stack trace at interrupt point then unwinder may > have difficulty to find the proper stack trace since *rbp is a arbitrary > value and *(rbp + 8) will not have proper func return address. Does this > make sense? Hard to read above... but I think you're saying that rbp will point to priv stack, irq happens and unwinder cannot work ? Yes. I was also expecting it to break, but orc unwinder with fallback to fp somehow did it correctly. See above stack dumps. For the top frame the unwinder starts from SP, so it's fine, but for the subprog 'foo' above the 'push rbp' pushes the addr of priv stack, so the chain should be broken, but the printed stack is correct, so I'm puzzled why it worked :)