On Thu, 3 Oct 2024 at 08:17, Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > > On 10/1/24 6:26 PM, Alexei Starovoitov wrote: > > On Tue, Oct 1, 2024 at 5:23 PM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote: > >> Makes sense, though will we have cases where hierarchical scheduling > >> attaches the same prog at different points of the hierarchy? > > I'm not sure anyone was asking for such a use case. > > > >> Then the > >> limit of 4 may not be enough (e.g. say with cgroup nested levels > 4). > > Well, 4 was the number from TJ. > > > > Anyway the proposed pseudo code: > > > > __bpf_prog_enter_recur_limited() > > { > > cnt = this_cpu_inc_return(*(prog->active)); > > if (cnt > 4) { > > inc_miss > > return 0; > > } > > // pass cnt into bpf prog somehow, like %rdx ? > > // or re-read prog->active from prog > > } > > > > > > then in the prologue emit: > > > > push rbp > > mov rbp, rsp > > if %rdx == 1 > > // main prog is called for the first time > > mov rsp, pcpu_priv_stack_top > > else > > // 2+nd time main prog is called or 1+ time subprog > > sub rsp, stack_size > > if rsp < pcpu_priv_stack_bottom > > goto exit // stack is too small, exit > > fi > > I have tried to implement this approach (not handling > recursion yet) based on the above approach. It works > okay with nested bpf subprogs like > main prog // set rsp = pcpu_priv_stack_top > subprog1 // some stack > subprog2 // some stack > > The pcpu_priv_stack is allocated like > priv_stack_ptr = __alloc_percpu_gfp(1024 * 16, 8, GFP_KERNEL); > > But whenever the prog called an external function, > e.g. a helper in this case, I will get a double fault. > An example could be > main prog // set rsp = pcpu_priv_stack_top > subprog1 // some stack > subprog2 // some stack > call bpf_seq_printf > (I modified bpf_iter_ipv6_route.c bpf prog for the above > purpose.) > I added some printk statements from the beginning of bpf_seq_printf and > nothing printed out either and of course traps still happens. > > I tried another example without subprog and the mainprog calls > a helper and the same double traps happens below too. > > The error log looks like > > [ 54.024955] traps: PANIC: double fault, error_code: 0x0 > [ 54.024969] Oops: double fault: 0000 [#1] PREEMPT SMP KASAN PTI > [ 54.024977] CPU: 3 UID: 0 PID: 1946 Comm: test_progs Tainted: G OE 6.11.0-10577-gf25c172fd840-dirty #968 > [ 54.024982] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE > [ 54.024983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 > [ 54.024986] RIP: 0010:error_entry+0x1e/0x140 > [ 54.024996] Code: ff ff 90 90 90 90 90 90 90 90 90 90 56 48 8b 74 24 08 48 89 7c 24 08 52 51 50 41 50 41 51 41 52 41 53 53 55 41 54 41 55 41 56 <41> 57 56 31 f6 31 d1 > [ 54.024999] RSP: 0018:ffffe8ffff580000 EFLAGS: 00010806 > [ 54.025002] RAX: f3f3f300f1f1f1f1 RBX: fffff91fffeb0044 RCX: ffffffff84201701 > [ 54.025005] RDX: fffff91fffeb0044 RSI: ffffffff8420128d RDI: ffffe8ffff580178 > [ 54.025007] RBP: ffffe8ffff580140 R08: 0000000000000000 R09: 0000000000000000 > [ 54.025009] R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000 > [ 54.025010] R13: 1ffffd1fffeb0014 R14: 0000000000000003 R15: ffffe8ffff580178 > [ 54.025012] FS: 00007fd076525d00(0000) GS:ffff8881f7180000(0000) knlGS:0000000000000000 > [ 54.025015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 54.025017] CR2: ffffe8ffff57fff8 CR3: 000000010cd80002 CR4: 0000000000370ef0 > [ 54.025021] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 54.025022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 54.025024] Call Trace: > [ 54.025026] <#DF> > [ 54.025028] ? __die_body+0xaf/0xc0 > [ 54.025032] ? die+0x2f/0x50 > [ 54.025036] ? exc_double_fault+0x73/0x80 > [ 54.025040] ? asm_exc_double_fault+0x23/0x30 > [ 54.025044] ? common_interrupt_return+0xb1/0xcc > [ 54.025048] ? asm_exc_page_fault+0xd/0x30 > [ 54.025051] ? error_entry+0x1e/0x140 > [ 54.025055] </#DF> > [ 54.025056] Modules linked in: bpf_testmod(OE) > [ 54.025061] ---[ end trace 0000000000000000 ]--- > > Maybe somebody could give a hint why I got a double fault > when calling external functions (outside of bpf programs) > with allocated stack? > I will help in debugging. Can you share the patch you applied locally so I can reproduce? > > > > Since stack bottom/top are known at JIT time we can > > generate reliable stack overflow checks. > > Much better than guard pages and -fstack-protector. > > The prog can alloc percpu > > (stack size of main prog + subprogs + extra) * 4 > > and it likely will be enough. > > If not, the stack protection will gently exit the prog > > when the stack is too deep. > > kfunc won't have such a check, so we need a buffer zone. > > Can have a guard page too, but feels like overkill.