On Tue, Oct 1, 2024 at 5:23 PM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote: > > Makes sense, though will we have cases where hierarchical scheduling > attaches the same prog at different points of the hierarchy? I'm not sure anyone was asking for such a use case. > Then the > limit of 4 may not be enough (e.g. say with cgroup nested levels > 4). Well, 4 was the number from TJ. Anyway the proposed pseudo code: __bpf_prog_enter_recur_limited() { cnt = this_cpu_inc_return(*(prog->active)); if (cnt > 4) { inc_miss return 0; } // pass cnt into bpf prog somehow, like %rdx ? // or re-read prog->active from prog } then in the prologue emit: push rbp mov rbp, rsp if %rdx == 1 // main prog is called for the first time mov rsp, pcpu_priv_stack_top else // 2+nd time main prog is called or 1+ time subprog sub rsp, stack_size if rsp < pcpu_priv_stack_bottom goto exit // stack is too small, exit fi Since stack bottom/top are known at JIT time we can generate reliable stack overflow checks. Much better than guard pages and -fstack-protector. The prog can alloc percpu (stack size of main prog + subprogs + extra) * 4 and it likely will be enough. If not, the stack protection will gently exit the prog when the stack is too deep. kfunc won't have such a check, so we need a buffer zone. Can have a guard page too, but feels like overkill.