Re: [PATCH bpf-next v3 1/2] bpf: Fix test verif_scale_strobemeta_subprogs failure due to llvm19

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 13 Feb 2024 11:21:08 -0800

On Thu, Feb 8, 2024 at 1:54 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
>
> With latest llvm19, I hit the following selftest failures with
>
>   $ ./test_progs -j
>   libbpf: prog 'on_event': BPF program load failed: Permission denied
>   libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG --
>   combined stack size of 4 calls is 544. Too large
>   verification time 1344153 usec
>   stack depth 24+440+0+32
>   processed 51008 insns (limit 1000000) max_states_per_insn 19 total_states 1467 peak_states 303 mark_read 146
>   -- END PROG LOAD LOG --
>   libbpf: prog 'on_event': failed to load: -13
>   libbpf: failed to load object 'strobemeta_subprogs.bpf.o'
>   scale_test:FAIL:expect_success unexpected error: -13 (errno 13)
>   #498     verif_scale_strobemeta_subprogs:FAIL
>
> The verifier complains too big of the combined stack size (544 bytes) which
> exceeds the maximum stack limit 512. This is a regression from llvm19 ([1]).
>
> In the above error log, the original stack depth is 24+440+0+32.
> To satisfy interpreter's need, in verifier the stack depth is adjusted to
> 32+448+32+32=544 which exceeds 512, hence the error. The same adjusted
> stack size is also used for jit case.
>
> But the jitted codes could use smaller stack size.
>
>   $ egrep -r stack_depth | grep round_up
>   arm64/net/bpf_jit_comp.c:       ctx->stack_size = round_up(prog->aux->stack_depth, 16);
>   loongarch/net/bpf_jit.c:        bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
>   powerpc/net/bpf_jit_comp.c:     cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
>   riscv/net/bpf_jit_comp32.c:             round_up(ctx->prog->aux->stack_depth, STACK_ALIGN);
>   riscv/net/bpf_jit_comp64.c:     bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
>   s390/net/bpf_jit_comp.c:        u32 stack_depth = round_up(fp->aux->stack_depth, 8);
>   sparc/net/bpf_jit_comp_64.c:            stack_needed += round_up(stack_depth, 16);
>   x86/net/bpf_jit_comp.c:         EMIT3_off32(0x48, 0x81, 0xEC, round_up(stack_depth, 8));
>   x86/net/bpf_jit_comp.c: int tcc_off = -4 - round_up(stack_depth, 8);
>   x86/net/bpf_jit_comp.c:                     round_up(stack_depth, 8));
>   x86/net/bpf_jit_comp.c: int tcc_off = -4 - round_up(stack_depth, 8);
>   x86/net/bpf_jit_comp.c:         EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8));
>
> In the above, STACK_ALIGN in riscv/net/bpf_jit_comp32.c is defined as 16.
> So stack is aligned in either 8 or 16, x86/s390 having 8-byte stack alignment and
> the rest having 16-byte alignment.
>
> This patch calculates total stack depth based on 16-byte alignment if jit is requested.
> For the above failing case, the new stack size will be 32+448+0+32=512 and no verification
> failure. llvm19 regression will be discussed separately in llvm upstream.
>
> The verifier change caused three test failures as these tests compared messages
> with stack size. More specifically,
>   - test_global_funcs/global_func1: adjusted to interpreter only since verification will
>     succeed in jit mode. A new test will be added for jit mode later.
>   - async_stack_depth/{pseudo_call_check, async_call_root_check}: since jit and interpreter
>     will calculate different stack sizes, the failure msg is adjusted to omit those
>     specific stack size numbers.
>
>   [1] https://lore.kernel.org/bpf/32bde0f0-1881-46c9-931a-673be566c61d@xxxxxxxxx/
>
> Suggested-by: Alexei Starovoitov <ast@xxxxxxxxxx>
> Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx>
> ---
>  kernel/bpf/verifier.c                          | 18 +++++++++++++-----
>  .../bpf/prog_tests/test_global_funcs.c         |  5 ++++-
>  .../selftests/bpf/progs/async_stack_depth.c    |  4 ++--
>  3 files changed, 19 insertions(+), 8 deletions(-)
>
> Changelogs:
>   v2 -> v3:
>     - fix async_stack_depth test failure if jit is turned off
>   v1 -> v2:
>     - fix some selftest failures
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index ddaf09db1175..6441a540904b 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -5812,6 +5812,17 @@ static int check_ptr_alignment(struct bpf_verifier_env *env,
>                                            strict);
>  }
>
> +static int round_up_stack_depth(struct bpf_verifier_env *env, int stack_depth)
> +{
> +       if (env->prog->jit_requested)
> +               return round_up(stack_depth, 16);
> +
> +       /* round up to 32-bytes, since this is granularity
> +        * of interpreter stack size
> +        */
> +       return round_up(max_t(u32, stack_depth, 1), 32);
> +}
> +
>  /* starting from main bpf function walk all instructions of the function
>   * and recursively walk all callees that given function can call.
>   * Ignore jump and exit insns.
> @@ -5855,10 +5866,7 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx)
>                         depth);
>                 return -EACCES;
>         }
> -       /* round up to 32-bytes, since this is granularity
> -        * of interpreter stack size
> -        */
> -       depth += round_up(max_t(u32, subprog[idx].stack_depth, 1), 32);
> +       depth += round_up_stack_depth(env, subprog[idx].stack_depth);
>         if (depth > MAX_BPF_STACK) {
>                 verbose(env, "combined stack size of %d calls is %d. Too large\n",
>                         frame + 1, depth);
> @@ -5952,7 +5960,7 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx)
>          */
>         if (frame == 0)
>                 return 0;
> -       depth -= round_up(max_t(u32, subprog[idx].stack_depth, 1), 32);
> +       depth -= round_up_stack_depth(env, subprog[idx].stack_depth);
>         frame--;
>         i = ret_insn[frame];
>         idx = ret_prog[frame];
> diff --git a/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c b/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c
> index e905cbaf6b3d..a3a41680b38e 100644
> --- a/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c
> +++ b/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c
> @@ -138,7 +138,10 @@ static void subtest_ctx_arg_rewrite(void)
>
>  void test_test_global_funcs(void)
>  {
> -       RUN_TESTS(test_global_func1);
> +       if (!env.jit_enabled) {
> +               RUN_TESTS(test_global_func1);
> +       }

Can we increase the amount of used stack size to make it fail
regardless of JIT? That's what I was asking on v1, actually. We have
those arbitrarily sized buf[256] and buf[300], what prevents us from
making them a few bytes bigger to not be affected by 16 vs 32 byte
rounding?

> +
>         RUN_TESTS(test_global_func2);
>         RUN_TESTS(test_global_func3);
>         RUN_TESTS(test_global_func4);
> diff --git a/tools/testing/selftests/bpf/progs/async_stack_depth.c b/tools/testing/selftests/bpf/progs/async_stack_depth.c
> index 3517c0e01206..36734683acbd 100644
> --- a/tools/testing/selftests/bpf/progs/async_stack_depth.c
> +++ b/tools/testing/selftests/bpf/progs/async_stack_depth.c
> @@ -30,7 +30,7 @@ static int bad_timer_cb(void *map, int *key, struct bpf_timer *timer)
>  }
>
>  SEC("tc")
> -__failure __msg("combined stack size of 2 calls is 576. Too large")
> +__failure __msg("combined stack size of 2 calls is")
>  int pseudo_call_check(struct __sk_buff *ctx)
>  {
>         struct hmap_elem *elem;
> @@ -45,7 +45,7 @@ int pseudo_call_check(struct __sk_buff *ctx)
>  }
>
>  SEC("tc")
> -__failure __msg("combined stack size of 2 calls is 608. Too large")
> +__failure __msg("combined stack size of 2 calls is")
>  int async_call_root_check(struct __sk_buff *ctx)
>  {
>         struct hmap_elem *elem;
> --
> 2.39.3
>