Re: yet another approach Was: [PATCH bpf-next v3 4/5] bpf, x86: Add jit support for private stack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 10/8/24 7:06 PM, Alexei Starovoitov wrote:
On Tue, Oct 8, 2024 at 3:10 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
We need to scrap this idea.
Let's go back to push/pop r11 around calls :(
I didn't give up :)

Here is a new idea that seems to work:

[  131.472066]  dump_stack_lvl+0x53/0x70
[  131.472066]  bpf_task_storage_get+0x3e/0x2f0
[  131.472066]  ? bpf_task_storage_get+0x231/0x2f0
[  131.472066]  bpf_prog_ed7a5f33cc9fefab_foo+0x30/0x32
[  131.472066]  bpf_prog_8c4f9bc79da6c27e_socket_post_create+0x68/0x6d
...
[  131.417145]  dump_stack_lvl+0x53/0x70
[  131.417145]  bpf_task_storage_get+0x3e/0x2f0
[  131.417145]  ? selinux_netlbl_socket_post_create+0xab/0x150
[  131.417145]  bpf_prog_8c4f9bc79da6c27e_socket_post_create+0x60/0x6d


The stack dump works fine out of main prog and out of subprog.

The key difference it to pretend to have stack_depth=0,
so there is no adjustment to %rsp,
but point %rbp to per-cpu private stack and grow it _up_.

For the main prog %rbp points to the bottom of priv stack
plus stack_depth it needs,
so all bpf insns that do r10-off access the bottom of that priv stack.
When subprog is called it does 'add %rbp, its_stack_depth' and
in turn it's using memory above the bottom of the priv stack.

That seems to work, but exceptions and tailcalls are broken.
I ran out of time today to debug.
Pls see the attached patch.

The core part of the code is below:

EMIT1(0x55); /* push rbp */ - EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */ + if (tail_call_reachable || !bpf_prog->aux->priv_stack_ptr) { + EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */ + } else { + if (!is_subprog) { + /* mov rsp, pcpu_priv_stack_bottom */ + void __percpu *priv_frame_ptr = + bpf_prog->aux->priv_stack_ptr + round_up(stack_depth, 8); + + /* movabs sp, priv_frame_ptr */ + emit_mov_imm64(&prog, AUX_REG, (long) priv_frame_ptr >> 32, + (u32) (long) priv_frame_ptr); + + /* add <aux_reg>, gs:[<off>] */ + EMIT2(0x65, 0x4c); + EMIT3(0x03, 0x1c, 0x25); + EMIT((u32)(unsigned long)&this_cpu_off, 4); + /* mov rbp, aux_reg */ + EMIT3(0x4c, 0x89, 0xdd); + } else { + /* add rbp, stack_depth */ + EMIT3_off32(0x48, 0x81, 0xC5, round_up(stack_depth, 8)); + } + }

So for main program, we have

push rbp rbp = per_cpu_ptr(priv_stack_ptr + stack_size) ... What will happen we have an interrupt like below? push rbp rbp = per_cpu_ptr(priv_stack_ptr + stack_size) <=== interrupt happens here ... If we need to dump the stack trace at interrupt point then unwinder may have difficulty to find the proper stack trace since *rbp is a arbitrary value and *(rbp + 8) will not have proper func return address. Does this make sense?





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux