Re: [PATCH bpf-next v6 1/9] bpf: Allow each subprog having stack size of 512 bytes

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Tue, 22 Oct 2024 23:29:12 +0200

On Tue, 22 Oct 2024 at 22:41, Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Tue, Oct 22, 2024 at 1:13 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
> >
> >
> > On 10/21/24 8:43 PM, Alexei Starovoitov wrote:
> > > On Mon, Oct 21, 2024 at 8:21 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
> > >>>>           for (int i = 0; i < env->subprog_cnt; i++) {
> > >>>> -               if (!i || si[i].is_async_cb) {
> > >>>> -                       ret = check_max_stack_depth_subprog(env, i);
> > >>>> +               check_subprog = !i || (check_priv_stack ? si[i].is_cb : si[i].is_async_cb);
> > >>> why?
> > >>> This looks very suspicious.
> > >> This is to simplify jit. For example,
> > >>      main_prog   <=== main_prog_priv_stack_ptr
> > >>        subprog1  <=== there is a helper which has a callback_fn
> > >>                  <=== for example bpf_for_each_map_elem
> > >>
> > >>          callback_fn
> > >>            subprog2
> > >>
> > >> In callback_fn, we cannot simplify do
> > >>      r9 += stack_size_for_callback_fn
> > >> since r9 may have been clobbered between subprog1 and callback_fn.
> > >> That is why currently I allocate private_stack separately for callback_fn.
> > >>
> > >> Alternatively we could do
> > >>      callback_fn_priv_stack_ptr = main_prog_priv_stack_ptr + off
> > >> where off equals to (stack size tree main_prog+subprog1).
> > >> I can do this approach too with a little more information in prog->aux.
> > >> WDYT?
> > > I see. I think we're overcomplicating the verifier just to
> > > be able to do 'r9 += stack' in the subprog.
> > > The cases of async vs sync and directly vs kfunc/helper
> > > (and soon with inlining of kfuncs) are getting too hard
> > > to reason about.
> > >
> > > I think we need to go back to the earlier approach
> > > where every subprog had its own private stack and was
> > > setting up r9 = my_priv_stack in the prologue.
> > >
> > > I suspect it's possible to construct a convoluted subprog
> > > that calls itself a limited amount of time and the verifier allows that.
> > > I feel it will be easier to detect just that condition
> > > in the verifier and fallback to the normal stack.
> >
> > I tried a simple bpf prog below.
> >
> > $ cat private_stack_subprog_recur.c
> > // SPDX-License-Identifier: GPL-2.0
> >
> > #include <vmlinux.h>
> > #include <bpf/bpf_helpers.h>
> > #include <bpf/bpf_tracing.h>
> > #include "../bpf_testmod/bpf_testmod.h"
> >
> > char _license[] SEC("license") = "GPL";
> >
> > #if defined(__TARGET_ARCH_x86)
> > bool skip __attribute((__section__(".data"))) = false;
> > #else
> > bool skip = true;
> > #endif
> >
> > int i;
> >
> > __noinline static void subprog1(int level)
> > {
> >          if (level > 0) {
> >                  subprog1(level >> 1);
> >                  i++;
> >          }
> > }
> >
> > SEC("kprobe")
> > int prog1(void)
> > {
> >          subprog1(1);
> >          return 0;
> > }
> >
> > In the above prog, we have a recursion of subprog1. The
> > callchain is:
> >     prog -> subprog1 -> subprog1
> >
> > The insn-level verification is successful since argument
> > of subprog1() has precise value.
> >
> > But eventually, verification failed with the following message:
> >    the call stack of 8 frames is too deep !
> >
> > The error message is
> >                  if (frame >= MAX_CALL_FRAMES) {
> >                          verbose(env, "the call stack of %d frames is too deep !\n",
> >                                  frame);
> >                          return -E2BIG;
> >                  }
> > in function check_max_stack_depth_subprog().
> > Basically in function check_max_stack_depth_subprog(), tracing subprog
> > call is done only based on call insn. All conditionals are ignored.
> > In the above example, check_max_stack_depth_subprog() will have the
> > call graph like
> >      prog -> subprog1 -> subprog1 -> subprog1 -> subprog1 -> ...
> > and eventually hit the error.
> >
> > Basically with check_max_stack_depth_subprog() self recursion is not
> > possible for a bpf prog.
> >
> > This limitation is back to year 2017.
> >    commit 70a87ffea8ac  bpf: fix maximum stack depth tracking logic
> >
> > So I assume people really do not write progs with self recursion inside
> > the main prog (including subprogs).
>
> Thanks for checking this part.
>
> What about sync and async callbacks? Can they recurse?
>
> Since progs are preemptible is the following possible:
>
> __noinline static void subprog(void)
> {
>   /* delay */
> }
>
> static int timer_cb(void *map, int *key, void *val)
> {
>   subprog();
> }
>
> SEC("tc")
> int prog1(void)
> {
>     bpf_timer_set_callback(  &timer_cb);
>     subprog();
>     return 0;
> }
>
> timers use softirq.
> I'm not sure whether it's the same stack or not.
> So it may be borderline ok-ish for other reasons,
> but the question remains. Will subprog recurse this way?
>

Yes, but not in the normal ways.
There can be only one softirq context per-CPU (even on preemptible RT
with timers running in kthreads), but timer_cb can also be called
directly by the prog. So any other context the same prog can execute
in will allow it to call timer_cb while another invocation is
potentially preempted out on the same CPU.
It might be better to disallow direct calling such async callbacks,
because I'm not sure anyone relies on that behavior, but it is
something I've previously looked at (for exception_cb, which is
disallowed to be called directly due to the distinct way prologue is
set up).

We'll also need to remember this when/if we introduce hardirq mode for
BPF timers.