On Tue, Nov 23, 2021 at 12:08 PM Yonghong Song <yhs@xxxxxx> wrote: > On 11/23/21 8:15 AM, YiFei Zhu wrote: > > On Mon, Nov 22, 2021 at 4:24 PM Yonghong Song <yhs@xxxxxx> wrote: > >> On 11/22/21 12:44 PM, YiFei Zhu wrote: > >>> On Mon, Nov 22, 2021 at 8:19 AM YiFei Zhu <zhuyifei@xxxxxxxxxx> wrote: > >>>> > >>>> Hi > >>>> > >>>> I've been investigating the use of BPF CO-RE. I discovered that if I > >>>> include vmlinux.h and have all structures annotated with > >>>> __attribute__((preserve_access_index)), including the context struct, > >>>> then a prog that accesses an array field in the context struct, in > >>>> some particular way, cannot pass the verifier. > >>>> > >>>> A bunch of manual reduction plus creduce gives me this output: > >>>> > >>>> struct bpf_sock_ops { > >>>> int family; > >>>> int remote_ip6[]; > >>>> } __attribute__((preserve_access_index)); > >>>> __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { > >>>> int a = d->family; > >>>> int *c = d->remote_ip6; > >>>> c[2] = a; > >>>> return 0; > >>>> } > >>>> > >>>> With Debian clang version 11.1.0-4+build1, this compiles to > >>>> > >>>> 0000000000000000 <b>: > >>>> 0: b7 02 00 00 04 00 00 00 r2 = 4 > >>>> 1: bf 13 00 00 00 00 00 00 r3 = r1 > >>>> 2: 0f 23 00 00 00 00 00 00 r3 += r2 > >>>> 3: 61 11 00 00 00 00 00 00 r1 = *(u32 *)(r1 + 0) > >>>> 4: 63 13 08 00 00 00 00 00 *(u32 *)(r3 + 8) = r1 > >>>> 5: b7 00 00 00 00 00 00 00 r0 = 0 > >>>> 6: 95 00 00 00 00 00 00 00 exit > >>>> > >>>> And the prog will be rejected with this verifier log: > >>>> > >>>> ; __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { > >>>> 0: (b7) r2 = 32 > >>>> 1: (bf) r3 = r1 > >>>> 2: (0f) r3 += r2 > >>>> last_idx 2 first_idx 0 > >>>> regs=4 stack=0 before 1: (bf) r3 = r1 > >>>> regs=4 stack=0 before 0: (b7) r2 = 32 > >>>> ; int a = d->family; > >>>> 3: (61) r1 = *(u32 *)(r1 +20) > >>>> ; c[2] = a; > >>>> 4: (63) *(u32 *)(r3 +8) = r1 > >>>> dereference of modified ctx ptr R3 off=32 disallowed > >>>> processed 5 insns (limit 1000000) max_states_per_insn 0 total_states > >>>> 0 peak_states 0 mark_read 0 > >>>> > >>>> Looking at check_ctx_reg() and its callsite at check_mem_access() in > >>>> verifier.c, it seems that the verifier really does not like when the > >>>> context pointer has an offset, in this case the field offset of > >>>> d->remote_ip6. > >>>> > >>>> I thought this is just an issue with array fields, that field offset > >>>> relocations may have trouble expressing two field accesses (one struct > >>>> member, one array memory). However, further testing reveals that this > >>>> is not the case, because if I simplify out the local variables, the > >>>> error is gone: > >>>> > >>>> struct bpf_sock_ops { > >>>> int family; > >>>> int remote_ip6[]; > >>>> } __attribute__((preserve_access_index)); > >>>> __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) { > >>>> d->remote_ip6[2] = d->family; > >>>> return 0; > >>>> } > >>>> > >>>> is compiled to: > >>>> > >>>> 0000000000000000 <b>: > >>>> 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) > >>>> 1: 63 21 0c 00 00 00 00 00 *(u32 *)(r1 + 12) = r2 > >>>> 2: b7 00 00 00 00 00 00 00 r0 = 0 > >>>> 3: 95 00 00 00 00 00 00 00 exit > >>>> > >>>> and is loaded as: > >>>> > >>>> ; d->remote_ip6[2] = d->family; > >>>> 0: (61) r2 = *(u32 *)(r1 +20) > >>>> ; d->remote_ip6[2] = d->family; > >>>> 1: (63) *(u32 *)(r1 +40) = r2 > >>>> invalid bpf_context access off=40 size=4 > >>>> > >>>> I believe this error is because d->remote_ip6 is read-only, that this > >>>> modification might be more of a product of creduce, but we can see > >>>> that the CO-RE adjected offset of the array element from the context > >>>> pointer is correct: 32 to remote_ip6, 8 array index, so total offset > >>>> is 40. > >>>> > >>>> Also note that removal of __attribute__((preserve_access_index)) from > >>>> the first (miscompiled) program produces exactly the same bytecode as > >>>> this new program (with no locals). > >>>> > >>>> What is going on here? Why does the access of an array in context in > >>>> this particular way cause it to generate code that would not pass the > >>>> verifier? Is it a bug in Clang/LLVM, or is it the verifier being too > >>>> strict? > >>> > >>> Additionally, testing the latest LLVM main branch, this test case, > >>> which does not touch array fields at all, fails but succeeded with > >>> clang version 11.1.0: > >>> > >>> struct bpf_sock_ops { > >>> int op; > >>> int bpf_sock_ops_cb_flags; > >>> } __attribute__((preserve_access_index)); > >>> enum { a, b } static (*c)() = (void *)9; > >>> enum d { e } f; > >>> enum d g; > >>> __attribute__((section("sockops"))) int h(struct bpf_sock_ops *i) { > >>> switch (i->op) { > >>> case a: > >>> f = g = c(i, i->bpf_sock_ops_cb_flags); > >>> break; > >>> case b: > >>> f = g = c(i, i->bpf_sock_ops_cb_flags); > >>> } > >>> return 0; > >>> } > >> > >> This is another issue which actually appears (even in bpf mailing list) > >> multiple times. > >> > >> The following change should fix the issue: > >> > >> $ diff t1.c t1-good.c > >> --- t1.c 2021-11-22 16:00:13.915921544 -0800 > >> +++ t1-good.c 2021-11-22 16:12:32.823710102 -0800 > >> @@ -5,13 +5,14 @@ > >> enum { a, b } static (*c)() = (void *)9; > >> enum d { e } f; > >> enum d g; > >> + #define __barrier asm volatile("" ::: "memory") > >> __attribute__((section("sockops"))) int h(struct bpf_sock_ops *i) { > >> switch (i->op) { > >> case a: > >> - f = g = c(i, i->bpf_sock_ops_cb_flags); > >> + f = g = c(i, i->bpf_sock_ops_cb_flags); __barrier; > >> break; > >> case b: > >> - f = g = c(i, i->bpf_sock_ops_cb_flags); > >> + f = g = c(i, i->bpf_sock_ops_cb_flags); __barrier; > >> } > >> return 0; > >> } > >> > >> Basically add a compiler barrier at the end of case statements > >> to prevent common code sinking. > >> > >> In the above case, for the original code, latest compiler did an > >> optimization like > >> case a: > >> tmp = reloc_offset(i->bpf_sock_ops_cb_flags); > >> case b: > >> tmp = reloc_offset(i->bpf_sock_ops_cb_flags); > >> common: > >> val = load r1, tmp > >> ... > >> > >> Note that reloc_offset is not sinked to the common code > >> due to its special internal representation. > >> > >> To avoid such a code generation, add compiler barrier to > >> the end of case statement to prevent load sinking in which case > >> we will have > >> val = load r1, reloc_offset(...) > >> and verifier will be happy about this. > >> > >> The commit you listed below had a big change which may enable > >> the above compiler optimization and llvm11 may just not do > >> the code sinking optimization in this particular instance. > >> > >> I guess the compiler could still enforce this. But since it does > >> not know whether the memory access is for context or not, doing > >> so might hurt performance. But any way, this has appeared multiple > >> times internally and also in the mailing list. We will take a further > >> look. > > > > I see, thanks for the explanations. What is interesting is that prior > > to that commit reloc_offset(i->bpf_sock_ops_cb_flags) is generated > > only once. The disassembly matches that of > > case a: > > case b: > > tmp = reloc_offset(i->bpf_sock_ops_cb_flags); > > val = load r1, tmp > > > > Whereas with the compiler barriers, both compilers generate (no common code): > > > > 0000000000000000 <h>: > > 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) > > 1: 15 02 0a 00 01 00 00 00 if r2 == 1 goto +10 <LBB0_3> > > 2: 55 02 11 00 00 00 00 00 if r2 != 0 goto +17 <LBB0_4> > > 3: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4) > > 4: 85 00 00 00 09 00 00 00 call 9 > > 5: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > > 7: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > > 8: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > > 10: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > > 11: 05 00 08 00 00 00 00 00 goto +8 <LBB0_4> > > > > 0000000000000060 <LBB0_3>: > > 12: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4) > > 13: 85 00 00 00 09 00 00 00 call 9 > > 14: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > > 16: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > > 17: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > > 19: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > > > > 00000000000000a0 <LBB0_4>: > > 20: b7 00 00 00 00 00 00 00 r0 = 0 > > 21: 95 00 00 00 00 00 00 00 exit > > > > Did the linked commit create the special internal representation so > > that they cannot be common code sinked, or is there some other issue > > going on, or am I misunderstanding something? > > Yes, the linked commit added a special builtin with additional > ever-increasing argument to prevent reloc_offset from sinking. > This is to ensure the relocation related codes won't be separated into > different basic blocks. But this won't be able to prevent the issue > you described in the above. Ah. I see. Thanks for the explanations. Looking forward to see the fixes! YiFei Zhu > > Thanks > > YiFei Zhu > >>> The bad code generation of latest LLVM: > >>> > >>> 0000000000000000 <h>: > >>> 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) > >>> 1: 15 02 01 00 01 00 00 00 if r2 == 1 goto +1 <LBB0_2> > >>> 2: 55 02 0b 00 00 00 00 00 if r2 != 0 goto +11 <LBB0_3> > >>> > >>> 0000000000000018 <LBB0_2>: > >>> 3: b7 03 00 00 04 00 00 00 r3 = 4 > >>> 4: bf 12 00 00 00 00 00 00 r2 = r1 > >>> 5: 0f 32 00 00 00 00 00 00 r2 += r3 > >>> 6: 61 22 00 00 00 00 00 00 r2 = *(u32 *)(r2 + 0) > >>> 7: 85 00 00 00 09 00 00 00 call 9 > >>> 8: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > >>> 10: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > >>> 11: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > >>> 13: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > >>> > >>> 0000000000000070 <LBB0_3>: > >>> 14: b7 00 00 00 00 00 00 00 r0 = 0 > >>> 15: 95 00 00 00 00 00 00 00 exit > >>> > >>> The good code generation of LLVM 11.1.0: > >>> > >>> 0000000000000000 <h>: > >>> 0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0) > >>> 1: 25 02 08 00 01 00 00 00 if r2 > 1 goto +8 <LBB0_2> > >>> 2: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4) > >>> 3: 85 00 00 00 09 00 00 00 call 9 > >>> 4: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > >>> 6: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > >>> 7: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll > >>> 9: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0 > >>> > >>> 0000000000000050 <LBB0_2>: > >>> 10: b7 00 00 00 00 00 00 00 r0 = 0 > >>> 11: 95 00 00 00 00 00 00 00 exit > >>> > >>> A bisect points me to this commit in LLVM as the first bad commit: > >>> > >>> commit 54d9f743c8b0f501288119123cf1828bf7ade69c > >>> Author: Yonghong Song <yhs@xxxxxx> > >>> Date: Wed Sep 2 22:56:41 2020 -0700 > >>> > >>> BPF: move AbstractMemberAccess and PreserveDIType passes to > >>> EP_EarlyAsPossible > >>> > >>> Move abstractMemberAccess and PreserveDIType passes as early as > >>> possible, right after clang code generation. > >>> > >>> [...] > >>> > >>> Differential Revision: https://reviews.llvm.org/D87153 > >>> > >>> YiFei Zhu > >>>