Re: [PATCH bpf-next v6 1/9] bpf: Allow each subprog having stack size of 512 bytes

Yonghong Song <yonghong.song@xxxxxxxxx> · Tue, 22 Oct 2024 13:13:38 -0700

On 10/21/24 8:43 PM, Alexei Starovoitov wrote:
On Mon, Oct 21, 2024 at 8:21 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
          for (int i = 0; i < env->subprog_cnt; i++) {
-               if (!i || si[i].is_async_cb) {
-                       ret = check_max_stack_depth_subprog(env, i);
+               check_subprog = !i || (check_priv_stack ? si[i].is_cb : si[i].is_async_cb);
why?
This looks very suspicious.
This is to simplify jit. For example,
     main_prog   <=== main_prog_priv_stack_ptr
       subprog1  <=== there is a helper which has a callback_fn
                 <=== for example bpf_for_each_map_elem

         callback_fn
           subprog2

In callback_fn, we cannot simplify do
     r9 += stack_size_for_callback_fn
since r9 may have been clobbered between subprog1 and callback_fn.
That is why currently I allocate private_stack separately for callback_fn.

Alternatively we could do
     callback_fn_priv_stack_ptr = main_prog_priv_stack_ptr + off
where off equals to (stack size tree main_prog+subprog1).
I can do this approach too with a little more information in prog->aux.
WDYT?
I see. I think we're overcomplicating the verifier just to
be able to do 'r9 += stack' in the subprog.
The cases of async vs sync and directly vs kfunc/helper
(and soon with inlining of kfuncs) are getting too hard
to reason about.

I think we need to go back to the earlier approach
where every subprog had its own private stack and was
setting up r9 = my_priv_stack in the prologue.

I suspect it's possible to construct a convoluted subprog
that calls itself a limited amount of time and the verifier allows that.
I feel it will be easier to detect just that condition
in the verifier and fallback to the normal stack.

I tried a simple bpf prog below.

$ cat private_stack_subprog_recur.c
// SPDX-License-Identifier: GPL-2.0

#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include "../bpf_testmod/bpf_testmod.h"

char _license[] SEC("license") = "GPL";

#if defined(__TARGET_ARCH_x86)
bool skip __attribute((__section__(".data"))) = false;
#else
bool skip = true;
#endif

int i;

__noinline static void subprog1(int level)
{
        if (level > 0) {
                subprog1(level >> 1);
                i++;
        }
}

SEC("kprobe")
int prog1(void)
{
        subprog1(1);
        return 0;
}

In the above prog, we have a recursion of subprog1. The
callchain is:
   prog -> subprog1 -> subprog1

The insn-level verification is successful since argument
of subprog1() has precise value.

But eventually, verification failed with the following message:
  the call stack of 8 frames is too deep !

The error message is
                if (frame >= MAX_CALL_FRAMES) {
                        verbose(env, "the call stack of %d frames is too deep !\n",
                                frame);
                        return -E2BIG;
                }
in function check_max_stack_depth_subprog().
Basically in function check_max_stack_depth_subprog(), tracing subprog
call is done only based on call insn. All conditionals are ignored.
In the above example, check_max_stack_depth_subprog() will have the
call graph like
    prog -> subprog1 -> subprog1 -> subprog1 -> subprog1 -> ...
and eventually hit the error.

Basically with check_max_stack_depth_subprog() self recursion is not
possible for a bpf prog.

This limitation is back to year 2017.
  commit 70a87ffea8ac  bpf: fix maximum stack depth tracking logic

So I assume people really do not write progs with self recursion inside
the main prog (including subprogs).