Re: BPF CO-RE and array fields in context struct

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 22, 2021 at 4:24 PM Yonghong Song <yhs@xxxxxx> wrote:
> On 11/22/21 12:44 PM, YiFei Zhu wrote:
> > On Mon, Nov 22, 2021 at 8:19 AM YiFei Zhu <zhuyifei@xxxxxxxxxx> wrote:
> >>
> >> Hi
> >>
> >> I've been investigating the use of BPF CO-RE. I discovered that if I
> >> include vmlinux.h and have all structures annotated with
> >> __attribute__((preserve_access_index)), including the context struct,
> >> then a prog that accesses an array field in the context struct, in
> >> some particular way, cannot pass the verifier.
> >>
> >> A bunch of manual reduction plus creduce gives me this output:
> >>
> >>    struct bpf_sock_ops {
> >>      int family;
> >>      int remote_ip6[];
> >>    } __attribute__((preserve_access_index));
> >>    __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) {
> >>      int a = d->family;
> >>      int *c = d->remote_ip6;
> >>      c[2] = a;
> >>      return 0;
> >>    }
> >>
> >> With Debian clang version 11.1.0-4+build1, this compiles to
> >>
> >>    0000000000000000 <b>:
> >>           0: b7 02 00 00 04 00 00 00 r2 = 4
> >>           1: bf 13 00 00 00 00 00 00 r3 = r1
> >>           2: 0f 23 00 00 00 00 00 00 r3 += r2
> >>           3: 61 11 00 00 00 00 00 00 r1 = *(u32 *)(r1 + 0)
> >>           4: 63 13 08 00 00 00 00 00 *(u32 *)(r3 + 8) = r1
> >>           5: b7 00 00 00 00 00 00 00 r0 = 0
> >>           6: 95 00 00 00 00 00 00 00 exit
> >>
> >> And the prog will be rejected with this verifier log:
> >>
> >>    ; __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) {
> >>    0: (b7) r2 = 32
> >>    1: (bf) r3 = r1
> >>    2: (0f) r3 += r2
> >>    last_idx 2 first_idx 0
> >>    regs=4 stack=0 before 1: (bf) r3 = r1
> >>    regs=4 stack=0 before 0: (b7) r2 = 32
> >>    ; int a = d->family;
> >>    3: (61) r1 = *(u32 *)(r1 +20)
> >>    ; c[2] = a;
> >>    4: (63) *(u32 *)(r3 +8) = r1
> >>    dereference of modified ctx ptr R3 off=32 disallowed
> >>    processed 5 insns (limit 1000000) max_states_per_insn 0 total_states
> >> 0 peak_states 0 mark_read 0
> >>
> >> Looking at check_ctx_reg() and its callsite at check_mem_access() in
> >> verifier.c, it seems that the verifier really does not like when the
> >> context pointer has an offset, in this case the field offset of
> >> d->remote_ip6.
> >>
> >> I thought this is just an issue with array fields, that field offset
> >> relocations may have trouble expressing two field accesses (one struct
> >> member, one array memory). However, further testing reveals that this
> >> is not the case, because if I simplify out the local variables, the
> >> error is gone:
> >>
> >>    struct bpf_sock_ops {
> >>      int family;
> >>      int remote_ip6[];
> >>    } __attribute__((preserve_access_index));
> >>    __attribute__((section("sockops"))) int b(struct bpf_sock_ops *d) {
> >>      d->remote_ip6[2] = d->family;
> >>      return 0;
> >>    }
> >>
> >> is compiled to:
> >>
> >>    0000000000000000 <b>:
> >>           0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0)
> >>           1: 63 21 0c 00 00 00 00 00 *(u32 *)(r1 + 12) = r2
> >>           2: b7 00 00 00 00 00 00 00 r0 = 0
> >>           3: 95 00 00 00 00 00 00 00 exit
> >>
> >> and is loaded as:
> >>
> >>    ; d->remote_ip6[2] = d->family;
> >>    0: (61) r2 = *(u32 *)(r1 +20)
> >>    ; d->remote_ip6[2] = d->family;
> >>    1: (63) *(u32 *)(r1 +40) = r2
> >>    invalid bpf_context access off=40 size=4
> >>
> >> I believe this error is because d->remote_ip6 is read-only, that this
> >> modification might be more of a product of creduce, but we can see
> >> that the CO-RE adjected offset of the array element from the context
> >> pointer is correct: 32 to remote_ip6, 8 array index, so total offset
> >> is 40.
> >>
> >> Also note that removal of __attribute__((preserve_access_index)) from
> >> the first (miscompiled) program produces exactly the same bytecode as
> >> this new program (with no locals).
> >>
> >> What is going on here? Why does the access of an array in context in
> >> this particular way cause it to generate code that would not pass the
> >> verifier? Is it a bug in Clang/LLVM, or is it the verifier being too
> >> strict?
> >
> > Additionally, testing the latest LLVM main branch, this test case,
> > which does not touch array fields at all, fails but succeeded with
> > clang version 11.1.0:
> >
> >    struct bpf_sock_ops {
> >      int op;
> >      int bpf_sock_ops_cb_flags;
> >    } __attribute__((preserve_access_index));
> >    enum { a, b } static (*c)() = (void *)9;
> >    enum d { e } f;
> >    enum d g;
> >    __attribute__((section("sockops"))) int h(struct bpf_sock_ops *i) {
> >      switch (i->op) {
> >      case a:
> >        f = g = c(i, i->bpf_sock_ops_cb_flags);
> >        break;
> >      case b:
> >        f = g = c(i, i->bpf_sock_ops_cb_flags);
> >      }
> >      return 0;
> >    }
>
> This is another issue which actually appears (even in bpf mailing list)
> multiple times.
>
> The following change should fix the issue:
>
>   $ diff t1.c t1-good.c
> --- t1.c        2021-11-22 16:00:13.915921544 -0800
> +++ t1-good.c   2021-11-22 16:12:32.823710102 -0800
> @@ -5,13 +5,14 @@
>     enum { a, b } static (*c)() = (void *)9;
>     enum d { e } f;
>     enum d g;
> +  #define __barrier asm volatile("" ::: "memory")
>     __attribute__((section("sockops"))) int h(struct bpf_sock_ops *i) {
>       switch (i->op) {
>       case a:
> -      f = g = c(i, i->bpf_sock_ops_cb_flags);
> +      f = g = c(i, i->bpf_sock_ops_cb_flags); __barrier;
>         break;
>       case b:
> -      f = g = c(i, i->bpf_sock_ops_cb_flags);
> +      f = g = c(i, i->bpf_sock_ops_cb_flags); __barrier;
>       }
>       return 0;
>     }
>
> Basically add a compiler barrier at the end of case statements
> to prevent common code sinking.
>
> In the above case, for the original code, latest compiler did an
> optimization like
>       case a:
>           tmp = reloc_offset(i->bpf_sock_ops_cb_flags);
>       case b:
>           tmp = reloc_offset(i->bpf_sock_ops_cb_flags);
>     common:
>       val = load r1, tmp
>       ...
>
> Note that reloc_offset is not sinked to the common code
> due to its special internal representation.
>
> To avoid such a code generation, add compiler barrier to
> the end of case statement to prevent load sinking in which case
> we will have
>      val = load r1, reloc_offset(...)
> and verifier will be happy about this.
>
> The commit you listed below had a big change which may enable
> the above compiler optimization and llvm11 may just not do
> the code sinking optimization in this particular instance.
>
> I guess the compiler could still enforce this. But since it does
> not know whether the memory access is for context or not, doing
> so might hurt performance. But any way, this has appeared multiple
> times internally and also in the mailing list. We will take a further
> look.

I see, thanks for the explanations. What is interesting is that prior
to that commit reloc_offset(i->bpf_sock_ops_cb_flags) is generated
only once. The disassembly matches that of
    case a:
    case b:
          tmp = reloc_offset(i->bpf_sock_ops_cb_flags);
          val = load r1, tmp

Whereas with the compiler barriers, both compilers generate (no common code):

  0000000000000000 <h>:
         0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0)
         1: 15 02 0a 00 01 00 00 00 if r2 == 1 goto +10 <LBB0_3>
         2: 55 02 11 00 00 00 00 00 if r2 != 0 goto +17 <LBB0_4>
         3: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4)
         4: 85 00 00 00 09 00 00 00 call 9
         5: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
         7: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0
         8: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
        10: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0
        11: 05 00 08 00 00 00 00 00 goto +8 <LBB0_4>

  0000000000000060 <LBB0_3>:
        12: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4)
        13: 85 00 00 00 09 00 00 00 call 9
        14: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
        16: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0
        17: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
        19: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0

  00000000000000a0 <LBB0_4>:
        20: b7 00 00 00 00 00 00 00 r0 = 0
        21: 95 00 00 00 00 00 00 00 exit

Did the linked commit create the special internal representation so
that they cannot be common code sinked, or is there some other issue
going on, or am I misunderstanding something?

Thanks
YiFei Zhu
> > The bad code generation of latest LLVM:
> >
> >    0000000000000000 <h>:
> >           0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0)
> >           1: 15 02 01 00 01 00 00 00 if r2 == 1 goto +1 <LBB0_2>
> >           2: 55 02 0b 00 00 00 00 00 if r2 != 0 goto +11 <LBB0_3>
> >
> >    0000000000000018 <LBB0_2>:
> >           3: b7 03 00 00 04 00 00 00 r3 = 4
> >           4: bf 12 00 00 00 00 00 00 r2 = r1
> >           5: 0f 32 00 00 00 00 00 00 r2 += r3
> >           6: 61 22 00 00 00 00 00 00 r2 = *(u32 *)(r2 + 0)
> >           7: 85 00 00 00 09 00 00 00 call 9
> >           8: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
> >          10: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0
> >          11: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
> >          13: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0
> >
> >    0000000000000070 <LBB0_3>:
> >          14: b7 00 00 00 00 00 00 00 r0 = 0
> >          15: 95 00 00 00 00 00 00 00 exit
> >
> > The good code generation of LLVM 11.1.0:
> >
> >    0000000000000000 <h>:
> >           0: 61 12 00 00 00 00 00 00 r2 = *(u32 *)(r1 + 0)
> >           1: 25 02 08 00 01 00 00 00 if r2 > 1 goto +8 <LBB0_2>
> >           2: 61 12 04 00 00 00 00 00 r2 = *(u32 *)(r1 + 4)
> >           3: 85 00 00 00 09 00 00 00 call 9
> >           4: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
> >           6: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0
> >           7: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
> >           9: 63 01 00 00 00 00 00 00 *(u32 *)(r1 + 0) = r0
> >
> >    0000000000000050 <LBB0_2>:
> >          10: b7 00 00 00 00 00 00 00 r0 = 0
> >          11: 95 00 00 00 00 00 00 00 exit
> >
> > A bisect points me to this commit in LLVM as the first bad commit:
> >
> >    commit 54d9f743c8b0f501288119123cf1828bf7ade69c
> >    Author: Yonghong Song <yhs@xxxxxx>
> >    Date:   Wed Sep 2 22:56:41 2020 -0700
> >
> >        BPF: move AbstractMemberAccess and PreserveDIType passes to
> > EP_EarlyAsPossible
> >
> >        Move abstractMemberAccess and PreserveDIType passes as early as
> >        possible, right after clang code generation.
> >
> >    [...]
> >
> >        Differential Revision: https://reviews.llvm.org/D87153
> >
> > YiFei Zhu
> >



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux