Re: [PATCH bpf-next v6 1/9] bpf: Allow each subprog having stack size of 512 bytes

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Tue, 22 Oct 2024 23:36:18 +0200

On Tue, 22 Oct 2024 at 23:29, Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote:
>
> On Tue, 22 Oct 2024 at 22:41, Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Tue, Oct 22, 2024 at 1:13 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
> > >
> > >
> > > On 10/21/24 8:43 PM, Alexei Starovoitov wrote:
> > > > On Mon, Oct 21, 2024 at 8:21 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
> > > >>>>           for (int i = 0; i < env->subprog_cnt; i++) {
> > > >>>> -               if (!i || si[i].is_async_cb) {
> > > >>>> -                       ret = check_max_stack_depth_subprog(env, i);
> > > >>>> +               check_subprog = !i || (check_priv_stack ? si[i].is_cb : si[i].is_async_cb);
> > > >>> why?
> > > >>> This looks very suspicious.
> > > >> This is to simplify jit. For example,
> > > >>      main_prog   <=== main_prog_priv_stack_ptr
> > > >>        subprog1  <=== there is a helper which has a callback_fn
> > > >>                  <=== for example bpf_for_each_map_elem
> > > >>
> > > >>          callback_fn
> > > >>            subprog2
> > > >>
> > > >> In callback_fn, we cannot simplify do
> > > >>      r9 += stack_size_for_callback_fn
> > > >> since r9 may have been clobbered between subprog1 and callback_fn.
> > > >> That is why currently I allocate private_stack separately for callback_fn.
> > > >>
> > > >> Alternatively we could do
> > > >>      callback_fn_priv_stack_ptr = main_prog_priv_stack_ptr + off
> > > >> where off equals to (stack size tree main_prog+subprog1).
> > > >> I can do this approach too with a little more information in prog->aux.
> > > >> WDYT?
> > > > I see. I think we're overcomplicating the verifier just to
> > > > be able to do 'r9 += stack' in the subprog.
> > > > The cases of async vs sync and directly vs kfunc/helper
> > > > (and soon with inlining of kfuncs) are getting too hard
> > > > to reason about.
> > > >
> > > > I think we need to go back to the earlier approach
> > > > where every subprog had its own private stack and was
> > > > setting up r9 = my_priv_stack in the prologue.
> > > >
> > > > I suspect it's possible to construct a convoluted subprog
> > > > that calls itself a limited amount of time and the verifier allows that.
> > > > I feel it will be easier to detect just that condition
> > > > in the verifier and fallback to the normal stack.
> > >
> > > I tried a simple bpf prog below.
> > >
> > > $ cat private_stack_subprog_recur.c
> > > // SPDX-License-Identifier: GPL-2.0
> > >
> > > #include <vmlinux.h>
> > > #include <bpf/bpf_helpers.h>
> > > #include <bpf/bpf_tracing.h>
> > > #include "../bpf_testmod/bpf_testmod.h"
> > >
> > > char _license[] SEC("license") = "GPL";
> > >
> > > #if defined(__TARGET_ARCH_x86)
> > > bool skip __attribute((__section__(".data"))) = false;
> > > #else
> > > bool skip = true;
> > > #endif
> > >
> > > int i;
> > >
> > > __noinline static void subprog1(int level)
> > > {
> > >          if (level > 0) {
> > >                  subprog1(level >> 1);
> > >                  i++;
> > >          }
> > > }
> > >
> > > SEC("kprobe")
> > > int prog1(void)
> > > {
> > >          subprog1(1);
> > >          return 0;
> > > }
> > >
> > > In the above prog, we have a recursion of subprog1. The
> > > callchain is:
> > >     prog -> subprog1 -> subprog1
> > >
> > > The insn-level verification is successful since argument
> > > of subprog1() has precise value.
> > >
> > > But eventually, verification failed with the following message:
> > >    the call stack of 8 frames is too deep !
> > >
> > > The error message is
> > >                  if (frame >= MAX_CALL_FRAMES) {
> > >                          verbose(env, "the call stack of %d frames is too deep !\n",
> > >                                  frame);
> > >                          return -E2BIG;
> > >                  }
> > > in function check_max_stack_depth_subprog().
> > > Basically in function check_max_stack_depth_subprog(), tracing subprog
> > > call is done only based on call insn. All conditionals are ignored.
> > > In the above example, check_max_stack_depth_subprog() will have the
> > > call graph like
> > >      prog -> subprog1 -> subprog1 -> subprog1 -> subprog1 -> ...
> > > and eventually hit the error.
> > >
> > > Basically with check_max_stack_depth_subprog() self recursion is not
> > > possible for a bpf prog.
> > >
> > > This limitation is back to year 2017.
> > >    commit 70a87ffea8ac  bpf: fix maximum stack depth tracking logic
> > >
> > > So I assume people really do not write progs with self recursion inside
> > > the main prog (including subprogs).
> >
> > Thanks for checking this part.
> >
> > What about sync and async callbacks? Can they recurse?
> >
> > Since progs are preemptible is the following possible:
> >
> > __noinline static void subprog(void)
> > {
> >   /* delay */
> > }
> >
> > static int timer_cb(void *map, int *key, void *val)
> > {
> >   subprog();
> > }
> >
> > SEC("tc")
> > int prog1(void)
> > {
> >     bpf_timer_set_callback(  &timer_cb);
> >     subprog();
> >     return 0;
> > }
> >
> > timers use softirq.
> > I'm not sure whether it's the same stack or not.
> > So it may be borderline ok-ish for other reasons,
> > but the question remains. Will subprog recurse this way?
> >
>
> Yes, but not in the normal ways.
> There can be only one softirq context per-CPU (even on preemptible RT
> with timers running in kthreads), but timer_cb can also be called
> directly by the prog. So any other context the same prog can execute
> in will allow it to call timer_cb while another invocation is
> potentially preempted out on the same CPU.
> It might be better to disallow direct calling such async callbacks,
> because I'm not sure anyone relies on that behavior, but it is
> something I've previously looked at (for exception_cb, which is
> disallowed to be called directly due to the distinct way prologue is
> set up).

Ah, in your example it's a subprog() called by both. Yeah, I guess we
can't really prevent that from happening.

>
> We'll also need to remember this when/if we introduce hardirq mode for
> BPF timers.