On Thu, 2024-01-04 at 15:09 -0800, Andrii Nakryiko wrote: [...] > > This seemed logical at the time of discussion, however, I can't figure > > a counter example at the moment. It appears that whatever are > > assumptions in check_stack_write_var_off() if spill is used in the > > precise context it would be marked eventually. > > E.g. the following is correctly rejected: > > > > SEC("raw_tp") > > __log_level(2) __flag(BPF_F_TEST_STATE_FREQ) > > __failure > > __naked void var_stack_1(void) > > { > > asm volatile ( > > "call %[bpf_get_prandom_u32];" > > "r9 = 100500;" > > "if r0 > 42 goto +1;" > > "r9 = 0;" > > "*(u64 *)(r10 - 16) = r9;" > > "call %[bpf_get_prandom_u32];" > > "r0 &= 0xf;" > > "r1 = -1;" > > "r1 -= r0;" > > "r2 = r10;" > > "r2 += r1;" > > "r0 = 0;" > > "*(u8 *)(r2 + 0) = r0;" > > "r1 = %[two_byte_buf];" > > "r2 = *(u32 *)(r10 -16);" > > "r1 += r2;" > > "*(u8 *)(r1 + 0) = r2;" /* this should not be fine */ > > "exit;" > > : > > : __imm_ptr(two_byte_buf), > > __imm(bpf_get_prandom_u32) > > : __clobber_common); > > } > > > > So now I'm not sure :( > > Sorry for too much noise. > > > hm... does that test have to do so many things and do all these u64 vs > u32 vs u8 conversions? The test is actually quite minimal, the longest part is conjuring of varying offset pointer in r2, here it is with additional comments: /* Write 0 or 100500 to fp-16, 0 is on the first verification pass */ "call %[bpf_get_prandom_u32];" "r9 = 100500;" "if r0 > 42 goto +1;" "r9 = 0;" "*(u64 *)(r10 - 16) = r9;" /* prepare a variable length access */ "call %[bpf_get_prandom_u32];" "r0 &= 0xf;" /* r0 range is [0; 15] */ "r1 = -1;" "r1 -= r0;" /* r1 range is [-16; -1] */ "r2 = r10;" "r2 += r1;" /* r2 range is [fp-16; fp-1] */ /* do a variable length write of constant 0 */ "r0 = 0;" "*(u8 *)(r2 + 0) = r0;" /* use fp-16 to access an array of length 2 */ "r1 = %[two_byte_buf];" "r2 = *(u32 *)(r10 -16);" "r1 += r2;" "*(u8 *)(r1 + 0) = r2;" /* this should not be fine */ "exit;" > Can we try a simple test were we spill u64 > SCALAR (imprecise) zero register to fp-8 or fp-16, and then use those > fp-8|fp-16 slot as an index into an array in precise context. Then > have a separate delayed branch that will write non-zero to fp-8|fp-16. > States shouldn't converge and this should be rejected. That is what test above does but it also includes varying offset access. [...]