Re: yet another approach Was: [PATCH bpf-next v3 4/5] bpf, x86: Add jit support for private stack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 9 Oct 2024 at 18:36, Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote:
>
> On Wed, 9 Oct 2024 at 04:06, Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Tue, Oct 8, 2024 at 3:10 PM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > We need to scrap this idea.
> > > Let's go back to push/pop r11 around calls :(
> >
> > I didn't give up :)
> >
> > Here is a new idea that seems to work:
> >
> > [  131.472066]  dump_stack_lvl+0x53/0x70
> > [  131.472066]  bpf_task_storage_get+0x3e/0x2f0
> > [  131.472066]  ? bpf_task_storage_get+0x231/0x2f0
> > [  131.472066]  bpf_prog_ed7a5f33cc9fefab_foo+0x30/0x32
> > [  131.472066]  bpf_prog_8c4f9bc79da6c27e_socket_post_create+0x68/0x6d
> > ...
> > [  131.417145]  dump_stack_lvl+0x53/0x70
> > [  131.417145]  bpf_task_storage_get+0x3e/0x2f0
> > [  131.417145]  ? selinux_netlbl_socket_post_create+0xab/0x150
> > [  131.417145]  bpf_prog_8c4f9bc79da6c27e_socket_post_create+0x60/0x6d
> >
> >
> > The stack dump works fine out of main prog and out of subprog.
> >
> > The key difference it to pretend to have stack_depth=0,
> > so there is no adjustment to %rsp,
> > but point %rbp to per-cpu private stack and grow it _up_.
> >
> > For the main prog %rbp points to the bottom of priv stack
> > plus stack_depth it needs,
> > so all bpf insns that do r10-off access the bottom of that priv stack.
> > When subprog is called it does 'add %rbp, its_stack_depth' and
> > in turn it's using memory above the bottom of the priv stack.
> >
> > That seems to work, but exceptions and tailcalls are broken.
>
> I fixed exceptions, the reason it breaks is because we:
> We get rsp and rbp for the main frame from unwinding.
> rsp has undergone subtraction for: stack depth, push r12, push callee regs.
>
> When setting up the frame for exception cb, we need to pop saved
> registers from stack and then 'reset stack frame' using mov rsp, rbp.
> That effectively undoes the subtraction that happened for stack depth,
> and at this point rsp == rbp. Then the verifier will set up the frame
> for exception cb like it does normally for any prog: subtract stack
> depth, and push callee saved regs.
>
> Now all of this was ok before, but this patch makes two changes:
> stack_depth is not subtracted, and rbp is a per-cpu stack pointer.
>
> Therefore, at the top of the stack is just the callee saved regs and r12.
> After popping those, it will be equal to the original rbp which was
> overwritten with per-cpu stack pointer.
>
> Doing mov rsp, rbp for this patch will reset rsp to per-cpu stack
> pointer. Instead, we do mov rbp, rsp. This restores the rbp to kernel
> stack pointer, and then the subsequent leave etc. return control back
> into the kernel.
>
> At least this seems to make everything work fine, and things no longer
> crash, and it looks sane etc.
>
> I will dig into the tail call case a bit later, but most likely it's a
> variation of this problem.

FYI, this is the fix.

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 268f7d37466c..02267adee14b 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -523,7 +523,7 @@ static void emit_prologue(struct bpf_prog
*bpf_prog, u8 **pprog, u32 stack_depth
                pop_callee_regs(&prog, all_callee_regs_used);
                pop_r12(&prog);
                /* Reset the stack frame. */
-               EMIT3(0x48, 0x89, 0xEC); /* mov rsp, rbp */
+               EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
        } else {
                EMIT1(0x55);             /* push rbp */
                if (tail_call_reachable || !bpf_prog->aux->priv_stack_ptr) {

>
> > I ran out of time today to debug.
> > Pls see the attached patch.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux