On Thu, Jan 4, 2024 at 3:29 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > On Thu, 2024-01-04 at 15:09 -0800, Andrii Nakryiko wrote: > [...] > > > This seemed logical at the time of discussion, however, I can't figure > > > a counter example at the moment. It appears that whatever are > > > assumptions in check_stack_write_var_off() if spill is used in the > > > precise context it would be marked eventually. > > > E.g. the following is correctly rejected: > > > > > > SEC("raw_tp") > > > __log_level(2) __flag(BPF_F_TEST_STATE_FREQ) > > > __failure > > > __naked void var_stack_1(void) > > > { > > > asm volatile ( > > > "call %[bpf_get_prandom_u32];" > > > "r9 = 100500;" > > > "if r0 > 42 goto +1;" > > > "r9 = 0;" > > > "*(u64 *)(r10 - 16) = r9;" > > > "call %[bpf_get_prandom_u32];" > > > "r0 &= 0xf;" > > > "r1 = -1;" > > > "r1 -= r0;" > > > "r2 = r10;" > > > "r2 += r1;" > > > "r0 = 0;" > > > "*(u8 *)(r2 + 0) = r0;" > > > "r1 = %[two_byte_buf];" > > > "r2 = *(u32 *)(r10 -16);" > > > "r1 += r2;" > > > "*(u8 *)(r1 + 0) = r2;" /* this should not be fine */ > > > "exit;" > > > : > > > : __imm_ptr(two_byte_buf), > > > __imm(bpf_get_prandom_u32) > > > : __clobber_common); > > > } > > > > > > So now I'm not sure :( > > > Sorry for too much noise. > > > > > > hm... does that test have to do so many things and do all these u64 vs > > u32 vs u8 conversions? > > The test is actually quite minimal, the longest part is conjuring of > varying offset pointer in r2, here it is with additional comments: > > /* Write 0 or 100500 to fp-16, 0 is on the first verification pass */ > "call %[bpf_get_prandom_u32];" > "r9 = 100500;" > "if r0 > 42 goto +1;" > "r9 = 0;" > "*(u64 *)(r10 - 16) = r9;" > /* prepare a variable length access */ > "call %[bpf_get_prandom_u32];" > "r0 &= 0xf;" /* r0 range is [0; 15] */ > "r1 = -1;" > "r1 -= r0;" /* r1 range is [-16; -1] */ > "r2 = r10;" > "r2 += r1;" /* r2 range is [fp-16; fp-1] */ > /* do a variable length write of constant 0 */ > "r0 = 0;" > "*(u8 *)(r2 + 0) = r0;" I meant this u8 > /* use fp-16 to access an array of length 2 */ > "r1 = %[two_byte_buf];" > "r2 = *(u32 *)(r10 -16);" and this u32. I'm not saying it's anything wrong, but it's simpler to deal with u64 consistently. There is nothing wrong with the test per se, I'm just saying we should try eliminate unnecessary cross-plays with narrowing/widening stores/loads. But that's offtopic, sorry. > "r1 += r2;" > "*(u8 *)(r1 + 0) = r2;" /* this should not be fine */ > "exit;" > > > Can we try a simple test were we spill u64 > > SCALAR (imprecise) zero register to fp-8 or fp-16, and then use those > > fp-8|fp-16 slot as an index into an array in precise context. Then > > have a separate delayed branch that will write non-zero to fp-8|fp-16. > > States shouldn't converge and this should be rejected. > > That is what test above does but it also includes varying offset access. > Yes, and the test fails, but if you read the log, you'll see that fp-8 is never marked precise, but it should. So we need more elaborate test that would somehow exploit fp-8 imprecision. I ran out of time. But what I tried was replacing "r2 = *(u32 *)(r10 -16);" with "r2 = *(u8 *)(r2 +0);" So keep both read and write as variable offset. And we are saved by some missing logic in read_var_off that would set r2 as known zero (because it should be for the branch where both fp-8 and fp-16 are zero). But that fails in the branch that should succeed, and if that branch actually succeeds, I suspect the branch where we initialize with non-zero r9 will erroneously succeed. Anyways, I still claim that we are mishandling a precision of spilled register when doing zero var_off writes. > [...]