Re: yet another approach Was: [PATCH bpf-next v3 4/5] bpf, x86: Add jit support for private stack

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Tue, 1 Oct 2024 18:26:25 -0700

On Tue, Oct 1, 2024 at 5:23 PM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote:
>
> Makes sense, though will we have cases where hierarchical scheduling
> attaches the same prog at different points of the hierarchy?

I'm not sure anyone was asking for such a use case.

> Then the
> limit of 4 may not be enough (e.g. say with cgroup nested levels > 4).

Well, 4 was the number from TJ.

Anyway the proposed pseudo code:

__bpf_prog_enter_recur_limited()
{
  cnt = this_cpu_inc_return(*(prog->active));
  if (cnt > 4) {
     inc_miss
     return 0;
  }
 // pass cnt into bpf prog somehow, like %rdx ?
 // or re-read prog->active from prog
}

then in the prologue emit:

push rbp
mov rbp, rsp
if %rdx == 1
   // main prog is called for the first time
   mov rsp, pcpu_priv_stack_top
else
   // 2+nd time main prog is called or 1+ time subprog
  sub rsp, stack_size
  if rsp < pcpu_priv_stack_bottom
    goto exit  // stack is too small, exit
fi

Since stack bottom/top are known at JIT time we can
generate reliable stack overflow checks.
Much better than guard pages and -fstack-protector.
The prog can alloc percpu
(stack size of main prog + subprogs + extra) * 4
and it likely will be enough.
If not, the stack protection will gently exit the prog
when the stack is too deep.
kfunc won't have such a check, so we need a buffer zone.
Can have a guard page too, but feels like overkill.