Re: [PATCH bpf-next v6 1/9] bpf: Allow each subprog having stack size of 512 bytes

Yonghong Song <yonghong.song@xxxxxxxxx> · Tue, 22 Oct 2024 14:43:22 -0700

On 10/22/24 1:41 PM, Alexei Starovoitov wrote:
On Tue, Oct 22, 2024 at 1:13 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:

On 10/21/24 8:43 PM, Alexei Starovoitov wrote:
On Mon, Oct 21, 2024 at 8:21 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
           for (int i = 0; i < env->subprog_cnt; i++) {
-               if (!i || si[i].is_async_cb) {
-                       ret = check_max_stack_depth_subprog(env, i);
+               check_subprog = !i || (check_priv_stack ? si[i].is_cb : si[i].is_async_cb);
why?
This looks very suspicious.
This is to simplify jit. For example,
      main_prog   <=== main_prog_priv_stack_ptr
        subprog1  <=== there is a helper which has a callback_fn
                  <=== for example bpf_for_each_map_elem

          callback_fn
            subprog2

In callback_fn, we cannot simplify do
      r9 += stack_size_for_callback_fn
since r9 may have been clobbered between subprog1 and callback_fn.
That is why currently I allocate private_stack separately for callback_fn.

Alternatively we could do
      callback_fn_priv_stack_ptr = main_prog_priv_stack_ptr + off
where off equals to (stack size tree main_prog+subprog1).
I can do this approach too with a little more information in prog->aux.
WDYT?
I see. I think we're overcomplicating the verifier just to
be able to do 'r9 += stack' in the subprog.
The cases of async vs sync and directly vs kfunc/helper
(and soon with inlining of kfuncs) are getting too hard
to reason about.

I think we need to go back to the earlier approach
where every subprog had its own private stack and was
setting up r9 = my_priv_stack in the prologue.

I suspect it's possible to construct a convoluted subprog
that calls itself a limited amount of time and the verifier allows that.
I feel it will be easier to detect just that condition
in the verifier and fallback to the normal stack.
I tried a simple bpf prog below.

$ cat private_stack_subprog_recur.c
// SPDX-License-Identifier: GPL-2.0

#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include "../bpf_testmod/bpf_testmod.h"

char _license[] SEC("license") = "GPL";

#if defined(__TARGET_ARCH_x86)
bool skip __attribute((__section__(".data"))) = false;
#else
bool skip = true;
#endif

int i;

__noinline static void subprog1(int level)
{
          if (level > 0) {
                  subprog1(level >> 1);
                  i++;
          }
}

SEC("kprobe")
int prog1(void)
{
          subprog1(1);
          return 0;
}

In the above prog, we have a recursion of subprog1. The
callchain is:
     prog -> subprog1 -> subprog1

The insn-level verification is successful since argument
of subprog1() has precise value.

But eventually, verification failed with the following message:
    the call stack of 8 frames is too deep !

The error message is
                  if (frame >= MAX_CALL_FRAMES) {
                          verbose(env, "the call stack of %d frames is too deep !\n",
                                  frame);
                          return -E2BIG;
                  }
in function check_max_stack_depth_subprog().
Basically in function check_max_stack_depth_subprog(), tracing subprog
call is done only based on call insn. All conditionals are ignored.
In the above example, check_max_stack_depth_subprog() will have the
call graph like
      prog -> subprog1 -> subprog1 -> subprog1 -> subprog1 -> ...
and eventually hit the error.

Basically with check_max_stack_depth_subprog() self recursion is not
possible for a bpf prog.

This limitation is back to year 2017.
    commit 70a87ffea8ac  bpf: fix maximum stack depth tracking logic

So I assume people really do not write progs with self recursion inside
the main prog (including subprogs).
Thanks for checking this part.

What about sync and async callbacks? Can they recurse?

For sync, there will be no recurses between subprogs.
This is due to the following func.

static int check_max_stack_depth(struct bpf_verifier_env *env)
{
        struct bpf_subprog_info *si = env->subprog_info;
        int ret;

        for (int i = 0; i < env->subprog_cnt; i++) {
                if (!i || si[i].is_async_cb) {
                        ret = check_max_stack_depth_subprog(env, i);
                        if (ret < 0)
                                return ret;
                }
                continue;
        }
        return 0;
}

subprog root only starts from the main prog or async_cb.
So regular sync callback will is treated similar
to other direct-call subprog.

Since progs are preemptible is the following possible:

__noinline static void subprog(void)
{
   /* delay */
}

static int timer_cb(void *map, int *key, void *val)
{
   subprog();
}

SEC("tc")
int prog1(void)
{
     bpf_timer_set_callback(  &timer_cb);
     subprog();
     return 0;
}

timers use softirq.
I'm not sure whether it's the same stack or not.
So it may be borderline ok-ish for other reasons,
but the question remains. Will subprog recurse this way?

But for async cb, as you mentioned it is possible that
prog1->subprog could be called in process context
and the callback timer_cb->subprog could be called in
nested way on top of prog1->subprog.

To handle such cases, I guess I can refactor the code
to record maximum stack_tree_depth in subprog info and
do the checking after the subprog 0 and all async
progs are processed.

To handle a subprog may be used in more than one
subtree (subprog 0 tree or async tree), I need to
add a 'visited' field to bpf_subprog_info.
I think this should work.