Re: yet another approach Was: [PATCH bpf-next v3 4/5] bpf, x86: Add jit support for private stack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 3 Oct 2024 at 22:44, Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
>
>
> On 10/3/24 10:35 AM, Alexei Starovoitov wrote:
> > On Thu, Oct 3, 2024 at 6:40 AM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote:
> >> On Thu, 3 Oct 2024 at 08:17, Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
> >>>
> >>> On 10/1/24 6:26 PM, Alexei Starovoitov wrote:
> >>>> On Tue, Oct 1, 2024 at 5:23 PM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote:
> >>>>> Makes sense, though will we have cases where hierarchical scheduling
> >>>>> attaches the same prog at different points of the hierarchy?
> >>>> I'm not sure anyone was asking for such a use case.
> >>>>
> >>>>> Then the
> >>>>> limit of 4 may not be enough (e.g. say with cgroup nested levels > 4).
> >>>> Well, 4 was the number from TJ.
> >>>>
> >>>> Anyway the proposed pseudo code:
> >>>>
> >>>> __bpf_prog_enter_recur_limited()
> >>>> {
> >>>>     cnt = this_cpu_inc_return(*(prog->active));
> >>>>     if (cnt > 4) {
> >>>>        inc_miss
> >>>>        return 0;
> >>>>     }
> >>>>    // pass cnt into bpf prog somehow, like %rdx ?
> >>>>    // or re-read prog->active from prog
> >>>> }
> >>>>
> >>>>
> >>>> then in the prologue emit:
> >>>>
> >>>> push rbp
> >>>> mov rbp, rsp
> >>>> if %rdx == 1
> >>>>      // main prog is called for the first time
> >>>>      mov rsp, pcpu_priv_stack_top
> >>>> else
> >>>>      // 2+nd time main prog is called or 1+ time subprog
> >>>>     sub rsp, stack_size
> >>>>     if rsp < pcpu_priv_stack_bottom
> >>>>       goto exit  // stack is too small, exit
> >>>> fi
> >>> I have tried to implement this approach (not handling
> >>> recursion yet) based on the above approach. It works
> >>> okay with nested bpf subprogs like
> >>>      main prog  // set rsp = pcpu_priv_stack_top
> >>>        subprog1 // some stack
> >>>          subprog2 // some stack
> >>>
> >>> The pcpu_priv_stack is allocated like
> >>>     priv_stack_ptr = __alloc_percpu_gfp(1024 * 16, 8, GFP_KERNEL);
> >>>
> >>> But whenever the prog called an external function,
> >>> e.g. a helper in this case, I will get a double fault.
> >>> An example could be
> >>>      main prog  // set rsp = pcpu_priv_stack_top
> >>>        subprog1 // some stack
> >>>          subprog2 // some stack
> >>>        call bpf_seq_printf
> >>> (I modified bpf_iter_ipv6_route.c bpf prog for the above
> >>> purpose.)
> >>> I added some printk statements from the beginning of bpf_seq_printf and
> >>> nothing printed out either and of course traps still happens.
> >>>
> >>> I tried another example without subprog and the mainprog calls
> >>> a helper and the same double traps happens below too.
> >>>
> >>> The error log looks like
> >>>
> >>> [   54.024955] traps: PANIC: double fault, error_code: 0x0
> >>> [   54.024969] Oops: double fault: 0000 [#1] PREEMPT SMP KASAN PTI
> >>> [   54.024977] CPU: 3 UID: 0 PID: 1946 Comm: test_progs Tainted: G           OE      6.11.0-10577-gf25c172fd840-dirty #968
> >>> [   54.024982] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> >>> [   54.024983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> >>> [   54.024986] RIP: 0010:error_entry+0x1e/0x140
> >>> [   54.024996] Code: ff ff 90 90 90 90 90 90 90 90 90 90 56 48 8b 74 24 08 48 89 7c 24 08 52 51 50 41 50 41 51 41 52 41 53 53 55 41 54 41 55 41 56 <41> 57 56 31 f6 31 d1
> >>> [   54.024999] RSP: 0018:ffffe8ffff580000 EFLAGS: 00010806
> >>> [   54.025002] RAX: f3f3f300f1f1f1f1 RBX: fffff91fffeb0044 RCX: ffffffff84201701
> >>> [   54.025005] RDX: fffff91fffeb0044 RSI: ffffffff8420128d RDI: ffffe8ffff580178
> >>> [   54.025007] RBP: ffffe8ffff580140 R08: 0000000000000000 R09: 0000000000000000
> >>> [   54.025009] R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000
> >>> [   54.025010] R13: 1ffffd1fffeb0014 R14: 0000000000000003 R15: ffffe8ffff580178
> >>> [   54.025012] FS:  00007fd076525d00(0000) GS:ffff8881f7180000(0000) knlGS:0000000000000000
> >>> [   54.025015] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [   54.025017] CR2: ffffe8ffff57fff8 CR3: 000000010cd80002 CR4: 0000000000370ef0
> >>> [   54.025021] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>> [   54.025022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>> [   54.025024] Call Trace:
> >>> [   54.025026]  <#DF>
> >>> [   54.025028]  ? __die_body+0xaf/0xc0
> >>> [   54.025032]  ? die+0x2f/0x50
> >>> [   54.025036]  ? exc_double_fault+0x73/0x80
> >>> [   54.025040]  ? asm_exc_double_fault+0x23/0x30
> >>> [   54.025044]  ? common_interrupt_return+0xb1/0xcc
> >>> [   54.025048]  ? asm_exc_page_fault+0xd/0x30
> >>> [   54.025051]  ? error_entry+0x1e/0x140
> >>> [   54.025055]  </#DF>
> >>> [   54.025056] Modules linked in: bpf_testmod(OE)
> >>> [   54.025061] ---[ end trace 0000000000000000 ]---
> >>>
> >>> Maybe somebody could give a hint why I got a double fault
> >>> when calling external functions (outside of bpf programs)
> >>> with allocated stack?
> >>>
> >> I will help in debugging. Can you share the patch you applied locally
> >> so I can reproduce?
> > Looks like the idea needs more thought.
> >
> > in_task_stack() won't recognize the private stack,
> > so it will look like stack overflow and double fault.
> >
> > do you have CONFIG_VMAP_STACK ?
>
> Yes, my above test runs fine withCONFIG_VMAP_STACK. Let me guard private stack support with
> CONFIG_VMAP_STACK for now. Not sure whether distributions enable
> CONFIG_VMAP_STACK or not.
>

I think it is the default on most distributions (Debian, Ubuntu, Fedora, etc.).





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux