On Thu, Feb 8, 2024 at 1:54 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > With latest llvm19, I hit the following selftest failures with > > $ ./test_progs -j > libbpf: prog 'on_event': BPF program load failed: Permission denied > libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG -- > combined stack size of 4 calls is 544. Too large > verification time 1344153 usec > stack depth 24+440+0+32 > processed 51008 insns (limit 1000000) max_states_per_insn 19 total_states 1467 peak_states 303 mark_read 146 > -- END PROG LOAD LOG -- > libbpf: prog 'on_event': failed to load: -13 > libbpf: failed to load object 'strobemeta_subprogs.bpf.o' > scale_test:FAIL:expect_success unexpected error: -13 (errno 13) > #498 verif_scale_strobemeta_subprogs:FAIL > > The verifier complains too big of the combined stack size (544 bytes) which > exceeds the maximum stack limit 512. This is a regression from llvm19 ([1]). > > In the above error log, the original stack depth is 24+440+0+32. > To satisfy interpreter's need, in verifier the stack depth is adjusted to > 32+448+32+32=544 which exceeds 512, hence the error. The same adjusted > stack size is also used for jit case. > > But the jitted codes could use smaller stack size. > > $ egrep -r stack_depth | grep round_up > arm64/net/bpf_jit_comp.c: ctx->stack_size = round_up(prog->aux->stack_depth, 16); > loongarch/net/bpf_jit.c: bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16); > powerpc/net/bpf_jit_comp.c: cgctx.stack_size = round_up(fp->aux->stack_depth, 16); > riscv/net/bpf_jit_comp32.c: round_up(ctx->prog->aux->stack_depth, STACK_ALIGN); > riscv/net/bpf_jit_comp64.c: bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16); > s390/net/bpf_jit_comp.c: u32 stack_depth = round_up(fp->aux->stack_depth, 8); > sparc/net/bpf_jit_comp_64.c: stack_needed += round_up(stack_depth, 16); > x86/net/bpf_jit_comp.c: EMIT3_off32(0x48, 0x81, 0xEC, round_up(stack_depth, 8)); > x86/net/bpf_jit_comp.c: int tcc_off = -4 - round_up(stack_depth, 8); > x86/net/bpf_jit_comp.c: round_up(stack_depth, 8)); > x86/net/bpf_jit_comp.c: int tcc_off = -4 - round_up(stack_depth, 8); > x86/net/bpf_jit_comp.c: EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8)); > > In the above, STACK_ALIGN in riscv/net/bpf_jit_comp32.c is defined as 16. > So stack is aligned in either 8 or 16, x86/s390 having 8-byte stack alignment and > the rest having 16-byte alignment. > > This patch calculates total stack depth based on 16-byte alignment if jit is requested. > For the above failing case, the new stack size will be 32+448+0+32=512 and no verification > failure. llvm19 regression will be discussed separately in llvm upstream. > > The verifier change caused three test failures as these tests compared messages > with stack size. More specifically, > - test_global_funcs/global_func1: adjusted to interpreter only since verification will > succeed in jit mode. A new test will be added for jit mode later. > - async_stack_depth/{pseudo_call_check, async_call_root_check}: since jit and interpreter > will calculate different stack sizes, the failure msg is adjusted to omit those > specific stack size numbers. > > [1] https://lore.kernel.org/bpf/32bde0f0-1881-46c9-931a-673be566c61d@xxxxxxxxx/ > > Suggested-by: Alexei Starovoitov <ast@xxxxxxxxxx> > Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx> > --- > kernel/bpf/verifier.c | 18 +++++++++++++----- > .../bpf/prog_tests/test_global_funcs.c | 5 ++++- > .../selftests/bpf/progs/async_stack_depth.c | 4 ++-- > 3 files changed, 19 insertions(+), 8 deletions(-) > > Changelogs: > v2 -> v3: > - fix async_stack_depth test failure if jit is turned off > v1 -> v2: > - fix some selftest failures > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index ddaf09db1175..6441a540904b 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -5812,6 +5812,17 @@ static int check_ptr_alignment(struct bpf_verifier_env *env, > strict); > } > > +static int round_up_stack_depth(struct bpf_verifier_env *env, int stack_depth) > +{ > + if (env->prog->jit_requested) > + return round_up(stack_depth, 16); > + > + /* round up to 32-bytes, since this is granularity > + * of interpreter stack size > + */ > + return round_up(max_t(u32, stack_depth, 1), 32); > +} > + > /* starting from main bpf function walk all instructions of the function > * and recursively walk all callees that given function can call. > * Ignore jump and exit insns. > @@ -5855,10 +5866,7 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx) > depth); > return -EACCES; > } > - /* round up to 32-bytes, since this is granularity > - * of interpreter stack size > - */ > - depth += round_up(max_t(u32, subprog[idx].stack_depth, 1), 32); > + depth += round_up_stack_depth(env, subprog[idx].stack_depth); > if (depth > MAX_BPF_STACK) { > verbose(env, "combined stack size of %d calls is %d. Too large\n", > frame + 1, depth); > @@ -5952,7 +5960,7 @@ static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx) > */ > if (frame == 0) > return 0; > - depth -= round_up(max_t(u32, subprog[idx].stack_depth, 1), 32); > + depth -= round_up_stack_depth(env, subprog[idx].stack_depth); > frame--; > i = ret_insn[frame]; > idx = ret_prog[frame]; > diff --git a/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c b/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c > index e905cbaf6b3d..a3a41680b38e 100644 > --- a/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c > +++ b/tools/testing/selftests/bpf/prog_tests/test_global_funcs.c > @@ -138,7 +138,10 @@ static void subtest_ctx_arg_rewrite(void) > > void test_test_global_funcs(void) > { > - RUN_TESTS(test_global_func1); > + if (!env.jit_enabled) { > + RUN_TESTS(test_global_func1); > + } Can we increase the amount of used stack size to make it fail regardless of JIT? That's what I was asking on v1, actually. We have those arbitrarily sized buf[256] and buf[300], what prevents us from making them a few bytes bigger to not be affected by 16 vs 32 byte rounding? > + > RUN_TESTS(test_global_func2); > RUN_TESTS(test_global_func3); > RUN_TESTS(test_global_func4); > diff --git a/tools/testing/selftests/bpf/progs/async_stack_depth.c b/tools/testing/selftests/bpf/progs/async_stack_depth.c > index 3517c0e01206..36734683acbd 100644 > --- a/tools/testing/selftests/bpf/progs/async_stack_depth.c > +++ b/tools/testing/selftests/bpf/progs/async_stack_depth.c > @@ -30,7 +30,7 @@ static int bad_timer_cb(void *map, int *key, struct bpf_timer *timer) > } > > SEC("tc") > -__failure __msg("combined stack size of 2 calls is 576. Too large") > +__failure __msg("combined stack size of 2 calls is") > int pseudo_call_check(struct __sk_buff *ctx) > { > struct hmap_elem *elem; > @@ -45,7 +45,7 @@ int pseudo_call_check(struct __sk_buff *ctx) > } > > SEC("tc") > -__failure __msg("combined stack size of 2 calls is 608. Too large") > +__failure __msg("combined stack size of 2 calls is") > int async_call_root_check(struct __sk_buff *ctx) > { > struct hmap_elem *elem; > -- > 2.39.3 >