On Mon, 2023-07-17 at 10:59 -0400, Luiz Capitulino wrote: > > On 2023-07-17 10:55, Eduard Zingerman wrote: > > > > > > > > > On Mon, 2023-07-17 at 09:04 -0400, Luiz Capitulino wrote: > > > Hi, > > > > > > The upstream commit below is backported to 5.10.186, 5.15.120 and 6.1.36: > > > > > > """ > > > commit ecdf985d7615356b78241fdb159c091830ed0380 > > > Author: Eduard Zingerman <eddyz87@xxxxxxxxx> > > > Date: Wed Feb 15 01:20:27 2023 +0200 > > > > > > bpf: track immediate values written to stack by BPF_ST instruction > > > """ > > > > > > This commit is causing the following bpf:test_verifier kselftest to fail: > > > > > > """ > > > # #760/p precise: ST insn causing spi > allocated_stack FAIL > > > """ > > > > > > > I can reproduce the error on 6.1.36 but don't understand what's causing it yet. > > The log is suspiciously different from master, will comment later today. > > Thank you very much for the prompt reply, Eduard. > > I'm available for further testing if needed. The test case in question looks as follows: ; old log new log 0: R1=ctx(off=0,imm=0) 0: r3 = r10 ; R3=fp0 R10=fp0 R3=fp0 R10=fp0 1: if r3 != 0x7b goto pc+0 ; R3=fp0 R3=fp0 2: *(u64 *)(r3 -8) = 0 ; R3=fp0 fp-8_w=mmmmmmmm R3=fp0 fp-8_w=00000000 3: r4 = *(u64 *)(r10 -8) ; R4_w=scalar() R10=fp0 R4_w=P0 R10=fp0 fp-8_w=00000000 4: r0 = -1 ; R0=-1 R0=-1 5: if r4 > r0 goto pc+0 6: exit The test checks the log generated by __mark_chain_precision() at instruction #5: registers `r4` and `r0` should be marked precise (in that order). The `r4` and `r0` are marked precise by check_cond_jmp_op() function because jump #5 is predicted (because comparison is unsigned `r4` is never greater than -1/`r0`). Failure happens because only log for `r0` markings is present. Patch [5] changes processing of instruction #2 ( *(u64 *)(r3 -8) = 0; ): - before patch fp[-8] is marked as mmmmmmmm (any 64-bit scalar value); - after patch fp[-8] is marked as 00000000. Which in turn changes processing of instruction #3 ( r4 = *(u64 *)(r10 -8); ): - before patch r4 is marked as unbound scalar - after patch r4 is marked as precise zero (check_stack_read_fixed_off() has a special case for zero loads, DST registers of such loads are always marked precise). If __mark_chain_precision() is called for a register that is already marked as precise no log is generated. Thus the log looks different when check_cond_jmp_op() calls __mark_chain_precision() for `r4` and then for `r0`: - before patch `r4` is not precise and log for it is generated; - after patch `r4` is precise and log for it is not generated. Now a question: why does current master has log for both `r4` and `r0`, given that commit [5] is present in master? This happens because of commit [2] (not back-ported to 6.1.36). Commit [2] adds function mark_all_scalars_imprecise() that removes precision marks from scalar registers when new state is created in a state chain (and for this test new states are created before #1 and #5 because of the BPF_F_TEST_STATE_FREQ flag). Effectively, before #5 is processed by check_cond_jmp_op() mark_all_scalars_imprecise() undoes `r4`'s precision mark assigned by check_stack_read_fixed_off() => log for both `r4` and `r0` is present. I need some time to convince myself that such interaction between check_stack_read_fixed_off() and mark_all_scalars_imprecise() is safe, but that's what we have in master. Commit [2] is a part of a series [4] by Andrii Nakryiko. This series had been partially back-ported already. Commits [0,1,2,3] are not yet back-ported and have to be picked altogether, otherwise there are numerous test failures. Still, when I cherry-pick [0,1,2,3] `./test_progs -a setget_sockopt` is failing. I'll investigate this failure but don't think I'll finish today. --- Alternatively, if the goal is to minimize amount of changes, we can disable or modify the 'precise: ST insn causing spi > allocated_stack'. --- Commits (in chronological order): [0] be2ef8161572 ("bpf: allow precision tracking for programs with subprogs") [1] f63181b6ae79 ("bpf: stop setting precise in current state") [2] 7a830b53c17b ("bpf: aggressively forget precise markings during state checkpointing") [3] 4f999b767769 ("selftests/bpf: make test_align selftest more robust") [4] 07d90c72efbe ("Merge branch 'BPF verifier precision tracking improvements'") [5] ecdf985d7615 ("bpf: track immediate values written to stack by BPF_ST instruction") > > - Luiz > > > > > > Since this test didn't fail before ecdf985d76 backport, the question is > > > if this is a test bug or if this commit introduced a regression. > > > > > > I haven't checked if this failure is present in latest Linus tree because > > > I was unable to build & run the bpf kselftests in an older distro. > > > > > > Also, there some important details about running the bpf kselftests > > > in 5.10 and 5.15: > > > > > > * On 5.10, bpf kselftest build is broken. The following upstream > > > commit needs to be cherry-picked for it to build & run: > > > > > > """ > > > commit 4237e9f4a96228ccc8a7abe5e4b30834323cd353 > > > Author: Gilad Reti <gilad.reti@xxxxxxxxx> > > > Date: Wed Jan 13 07:38:08 2021 +0200 > > > > > > selftests/bpf: Add verifier test for PTR_TO_MEM spill > > > """ > > > > > > * On 5.15.120 there's one additional test that's failing, but I didn't > > > debug this one: > > > > > > """ > > > #150/p calls: trigger reg2btf_ids[reg→type] for reg→type > __BPF_REG_TYPE_MAX FAIL > > > FAIL > > > """ > > > > > > * On 5.11 onwards, building and running bpf tests is disabled by > > > default by commit 7a6eb7c34a78498742b5f82543b7a68c1c443329, so I wonder > > > if we want to backport this to 5.10 as well? > > > > > > Thanks! > > > > > > - Luiz > > > > >