Re: yet another approach Was: [PATCH bpf-next v3 4/5] bpf, x86: Add jit support for private stack

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Wed, 2 Oct 2024 02:22:40 +0200

On Tue, 1 Oct 2024 at 23:28, Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Tue, Oct 1, 2024 at 1:51 PM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote:
> >
> > On Tue, 1 Oct 2024 at 21:53, Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > Another idea...
> > >
> >
> > Thanks for explaining why push/pop is still necessary. I agree then it
> > seems it cannot be avoided.
> >
> > > Currently the prologue looks like:
> > > push rbp
> > > mov rbp, rsp
> > > sub rsp, stack_depth
> > >
> > > how about in the main prog we keep the first two insns,
> > > but then set rsp with a single insn to point to the top
> > > of our private stack that should have enough room
> > > for stack_of_main_prog + stacks_of_all_subprogs + extra 8k for kfuncs/helpers.
> > >
> > > The prologue of all subprogs will stay as-is with above 3 insns.
> > > The epilogue is the same in main prog and subprogs: leave + ret.
> > >
> > > Such stack will look like a typical split stack used in compilers.
> > >
> > > The obvious advantage is we don't need to touch r9, do push/pop,
> > > and stack unwind will work just fine.
> > > In the past we discussed something like this, but
> > > then we did all 3 insns in the private stack
> > > and it was problematic due to IRQs.
> > > In this approach the main prog will use up to 512 bytes of
> > > kernel stack, but everything that it calls will be in the private stack,
> > > and since it doesn't migrate there is no per-cpu memory reuse issue.
> > >
> >
> > I think this is much better, but I'm wondering how the hierarchical
> > scheduling case will occur in reality.
> >
> > Will it be the main prog invoking a kfunc, that in turn invokes
> > another prog, which can do the same thing again?
>
> I believe that's the plan.
>
> > If so, the lack of using a private stack for main prog would be a
> > problem, right? Because effectively if we don't call into subprogs we
> > don't use the private stack at all, and all invocations share the same
> > kernel stack, which brings us back to the current state.
>
> Not quite. With the above proposal anything that the main prog
> calls (kfuncs, helpers, subprogs) will be using private stack
> prepared by that main prog.
> Then another 'struct bpf_prog' called from kfunc will use
> the stack prepared by the main prog and that 2nd main prog
> will prepare another priv stack for everything that 2nd main prog calls.
> So we can allow arbitrary depth.
> The only problem if the same 'struct bpf_prog' is called
> recursively (since it will use the same priv stack),
> but that issue we avoid with per prog recursion counter.
> So I think this proposal should work for all prog types
> except those where bpf_prog_check_recur() returns false.
>
> I think we can make it work with a special
> __bpf_prog_enter_limited_recur()
> that does this_cpu_inc_return(*(prog->active) and allows
> limited recursion (up to 4 ?) and then sets %rsp on entry
> to a different slot in preallocated private stack
> based on prog->active value.

Makes sense, though will we have cases where hierarchical scheduling
attaches the same prog at different points of the hierarchy? Then the
limit of 4 may not be enough (e.g. say with cgroup nested levels > 4).
It might be that all non-leaf progs simply call into the successor
prog until the leaf is hit and the actual logic kicks in. Then it
could all be the same progs while walking down the hierarchy until a
different prog is called for the leaf.

>
> > Instead can we set rbp to point to the top of the private stack in the
> > main prog itself?
>
> we cannot change both %rsp and %rbp atomically,
> so setting rsp and then adjusting rbp will break stack unwind.
> After 'mov %rsp, priv_stack_addr' we can do 2nd pair of
> push rbp
> mov rbp, rsp
> sub rsp, stack_size
> so that even the main prog will use the priv stack, but
> I'm not sure how unwinder will deal with that.
> So I would let the main prog use the kernel stack.

Ack. I think letting the main subprog use the kernel stack as you
propose is fine.