On Thu, Jan 4, 2024 at 1:30 PM Barret Rhoden <brho@xxxxxxxxxx> wrote: > > On 1/4/24 12:31, Yonghong Song wrote: > [snip] > > >>> +/* > >>> + * Access an array element within a bound, such that the verifier > >>> knows the > >>> + * access is safe. > >>> + * > >>> + * This macro asm is the equivalent of: > >>> + * > >>> + * if (!arr) > >>> + * return NULL; > >>> + * if (idx >= arr_sz) > >>> + * return NULL; > >>> + * return &arr[idx]; > >>> + * > >>> + * The index (___idx below) needs to be a u64, at least for certain > >>> versions of > >>> + * the BPF ISA, since there aren't u32 conditional jumps. > >>> + */ > >>> +#define bpf_array_elem(arr, arr_sz, idx) ({ \ > >>> + typeof(&(arr)[0]) ___arr = arr; \ > >>> + __u64 ___idx = idx; \ > >>> + if (___arr) { \ > >>> + asm volatile("if %[__idx] >= %[__bound] goto 1f; \ > >>> + %[__idx] *= %[__size]; \ > >>> + %[__arr] += %[__idx]; \ > >>> + goto 2f; \ > >>> + 1:; \ > >>> + %[__arr] = 0; \ > >>> + 2: \ > >>> + " \ > >>> + : [__arr]"+r"(___arr), [__idx]"+r"(___idx) \ > >>> + : [__bound]"r"((arr_sz)), \ > >>> + [__size]"i"(sizeof(typeof((arr)[0]))) \ > >>> + : "cc"); \ > >>> + } \ > >>> + ___arr; \ > >>> +}) > > > > The LLVM bpf backend has made some improvement to handle the case like > > r1 = ... > > r2 = r1 + 1 > > if (r2 < num) ... > > using r1 > > by preventing generating the above code pattern. > > > > The implementation is a pattern matching style so surely it won't be > > able to cover all cases. > > > > Do you have specific examples which has verification failure due to > > false array out of bound access? > > Not in a small example. =( > > This bug has an example, but it was part of a larger program: > https://github.com/google/ghost-userspace/issues/31 > > The rough progression was: > - sometimes the compiler optimizes out the checks. So we added a macro > to make the compiler not know the value of the variable anymore. > - then, the compiler would occasionally do the check on a copy of the > register, so we did the comparison and index operation all in assembly. > > > I tried using bpf_cmp_likely() in my actual program (not just a one-off > test), and still had a verifier issue. It's a large and convoluted > program, so it might be hard to get a small reproducer. But it a > different compiler issue than the one you mentioned. > > Specifically, I swapped out my array-access-macro for this one, using > bpf_cmp_likely(): > > #define bpf_array_elem(arr, arr_sz, idx) ({ \ > typeof(&(arr)[0]) ___arr = arr; \ > typeof(&(arr)[0]) ___ret = 0; \ > u64 ___idx = idx; \ > if (___arr && bpf_cmp_likely(___idx, <, arr_sz)) \ > ___ret = &___arr[___idx];\ > ___ret; \ > }) > > which should be the same logic as before: > > * if (!arr) > * return NULL; > * if (idx >= arr_sz) > * return NULL; > * return &arr[idx]; > > The issue I run into is different than the one you had. The compiler > did the bounds check, but then for some reason recreated the index. The > index is coming from another map. > > Arguably, the verifier is doing its job - that value could have changed. > I just don't want the compiler to do the reread or any other > shenanigans in between the bounds check and the usage. > > The guts of the error: > - r0 is the map (L127) > - r1 is the index, read from another map (L128) > - r1 gets verified to be less than 0x200 (L129) > - some other stuff happens > - r1 gets read again, and is no longer bound (L132) > - r1 gets scaled up by 896. > (896*0x200 = 0x70000, would be the real bound, but r1 lost the 0x200 > bound) > - r0 indexed by the bad r1 (L134) > - blow up (L143) > > 127: (15) if r0 == 0x0 goto pc+1218 ; > R0=map_value(off=0,ks=4,vs=458752,imm=0) > > 128: (79) r1 = *(u64 *)(r10 -40) ; > R1_w=Pscalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 > > 129: (35) if r1 >= 0x200 goto pc+1216 ; > R1_w=Pscalar(umax=511,var_off=(0x0; 0x1ff)) > > 130: (79) r4 = *(u64 *)(r10 -56) ; R4_w=Pscalar() R10=fp0; > > 131: (37) r4 /= 1000 ; R4_w=Pscalar() > > 132: (79) r1 = *(u64 *)(r10 -40) ; > R1_w=Pscalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0; > > 133: (27) r1 *= 896 ; > R1_w=Pscalar(umax=3848290696320,var_off=(0x0; > 0x3ffffffff80),s32_max=2147483520,u32_max=-128) > > 134: (0f) r0 += r1 ; > R0_w=map_value(off=0,ks=4,vs=458752,umax=3848290696320,var_off=(0x0; > 0x3ffffffff80),s32_max=2147483520,u32_max=-128) > R1_w=Pscalar(umax=3848290696320,var_off=(0x0; > 0x3ffffffff80),s32_max=2147483520,u32_max=-128) > > 135: (79) r3 = *(u64 *)(r10 -48) ; > R3_w=map_value(off=0,ks=4,vs=15728640,imm=0) R10=fp0; > > 136: (0f) r3 += r8 ; > R3_w=map_value(off=0,ks=4,vs=15728640,umax=15728400,var_off=(0x0; > 0xfffff0),s32_max=16777200,u32_max=16777200) > R8=Pscalar(umax=15728400,var_off=(0x0; 0xfffff0)) > > 137: (61) r1 = *(u32 *)(r7 +16) ; > R1_w=Pscalar(umax=4294967295,var_off=(0x0; 0xffffffff)) > R7=map_value(id=18779,off=0,ks=4,vs=224,imm=0) > > 138: (79) r2 = *(u64 *)(r3 +88) ; R2=Pscalar() > R3=map_value(off=0,ks=4,vs=15728640,umax=15728400,var_off=(0x0; > 0xfffff0),s32_max=16777200,u32_max=16777200) > > 139: (a5) if r1 < 0x9 goto pc+1 ; > R1=Pscalar(umin=9,umax=4294967295,var_off=(0x0; 0xffffffff)) > > 140: (b7) r1 = 0 ; R1_w=P0 > > 141: (27) r1 *= 72 ; R1_w=P0 > > 142: (0f) r0 += r1 ; > R0_w=map_value(off=0,ks=4,vs=458752,umax=3848290696320,var_off=(0x0; > 0x3ffffffff80),s32_max=2147483520,u32_max=-128) R1_w=P0 > > 143: (7b) *(u64 *)(r0 +152) = r2 > > > if i put in a little ASM magic to tell the compiler to not recreate the > index, it works, like so: > > #define BPF_MUST_CHECK(x) ({ asm volatile ("" : "+r"(x)); x; }) > > #define bpf_array_elem(arr, arr_sz, idx) ({ \ > typeof(&(arr)[0]) ___arr = arr; \ > typeof(&(arr)[0]) ___ret = 0; \ > u64 ___idx = idx; \ > BPF_MUST_CHECK(___idx); \ > if (___arr && bpf_cmp_likely(___idx, <, arr_sz)) \ > ___ret = &___arr[___idx];\ > ___ret; \ > }) > > though anecdotally, that only stops the "reread the index from its map" > problem, similar to a READ_ONCE. the compiler is still free to just use > another register for the check. > > The bit of ASM i had from a while back that did that was: > > * r2 = r8 > * r2 <<= 32 > > * r2 >>= 32 > * if r2 > 0x3ff goto pc+29 > > * r8 <<= 32 > > * r8 >>= 32 > > * r8 <<= 6 > > * r0 += r8 > * *(u64 *)(r0 +48) = r3 > > > where r2 was bounds checked, but r8 was used instead. > > I'll play around and see if I can come up with a selftest that can run > into any of these "you did the check, but threw the check away" scenarios. Before we add full asm bpf_array_elem() macros let's fully understand the issue first. Maybe it's a llvm deficiency or verifier miss that can be addressed. asm everywhere isn't a viable approach long term. First start with: asm volatile ("" : "+r"((short)x)); It will avoid unnecessary <<=32, >>=32 in -mcpu=v3,v4. Then do: if (likely(___arr) && bpf_cmp_likely(___idx, <, arr_sz)) ^^^ just to have the expected basic block layout, because that's what your asm does. And, of course, a selftest is necessary to debug this further.