On Tue, Feb 8, 2022 at 10:28 AM Vincent Li <vincent.mc.li@xxxxxxxxx> wrote: > > On Mon, Feb 7, 2022 at 10:47 PM Yonghong Song <yhs@xxxxxx> wrote: > > > > > > > > On 2/4/22 1:22 PM, Yonghong Song wrote: > > > > > > > > > On 2/4/22 11:39 AM, Vincent Li wrote: > > >> On Fri, Feb 4, 2022 at 10:04 AM Yonghong Song <yhs@xxxxxx> wrote: > > >>> > > >>> > > >>> > > >>> On 2/4/22 3:11 AM, Timo Beckers wrote: > > >>>> On 2/3/22 03:11, Yonghong Song wrote: > > >>>>> > > >>>>> > > >>>>> On 2/2/22 5:47 AM, Timo Beckers wrote: > > >>>>>> On 2/2/22 08:17, Yonghong Song wrote: > > >>>>>>> > > >>>>>>> > > >>>>>>> On 2/1/22 10:07 AM, Vincent Li wrote: > > >>>>>>>> On Fri, Jan 28, 2022 at 10:27 AM Vincent Li > > >>>>>>>> <vincent.mc.li@xxxxxxxxx> wrote: > > >>>>>>>>> > > >>>>>>>>> On Thu, Jan 27, 2022 at 5:50 PM Yonghong Song <yhs@xxxxxx> wrote: > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> On 1/25/22 12:32 PM, Vincent Li wrote: > > >>>>>>>>>>> On Tue, Jan 25, 2022 at 9:52 AM Vincent Li > > >>>>>>>>>>> <vincent.mc.li@xxxxxxxxx> wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>> this is macro I suspected in my implementation that could > > >>>>>>>>>>>> cause issue with BTF > > >>>>>>>>>>>> > > >>>>>>>>>>>> #define ENABLE_VTEP 1 > > >>>>>>>>>>>> #define VTEP_ENDPOINT (__u32[]){0xec48a90a, 0xee48a90a, > > >>>>>>>>>>>> 0x1f48a90a, > > >>>>>>>>>>>> 0x2048a90a, } > > >>>>>>>>>>>> #define VTEP_MAC (__u64[]){0x562e984c3682, 0x582e984c3682, > > >>>>>>>>>>>> 0x5eaaed93fdf2, 0x5faaed93fdf2, } > > >>>>>>>>>>>> #define VTEP_NUMS 4 > > >>>>>>>>>>>> > > >>>>>>>>>>>> On Tue, Jan 25, 2022 at 9:38 AM Vincent Li > > >>>>>>>>>>>> <vincent.mc.li@xxxxxxxxx> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Hi > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> While developing Cilium VTEP integration feature > > >>>>>>>>>>>>> https://github.com/cilium/cilium/pull/17370, I found a > > >>>>>>>>>>>>> strange issue > > >>>>>>>>>>>>> that seems related to BTF and probably caused by my specific > > >>>>>>>>>>>>> implementation, the issue is described in > > >>>>>>>>>>>>> https://github.com/cilium/cilium/issues/18616, I don't know > > >>>>>>>>>>>>> much about > > >>>>>>>>>>>>> BTF and not sure if my implementation is seriously flawed > > >>>>>>>>>>>>> or just some > > >>>>>>>>>>>>> implementation bug or maybe not compatible with BTF. > > >>>>>>>>>>>>> Strangely, the > > >>>>>>>>>>>>> issue appears related to number of VTEPs I use, no problem > > >>>>>>>>>>>>> with 1 or 2 > > >>>>>>>>>>>>> VTEP, 3, 4 VTEPs will have problem with BTF, any guidance > > >>>>>>>>>>>>> from BTF > > >>>>>>>>>>>>> experts are appreciated :-). > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Thanks > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Vincent > > >>>>>>>>>>> > > >>>>>>>>>>> Sorry for previous top post > > >>>>>>>>>>> > > >>>>>>>>>>> it looks the compiler compiles the cilium bpf_lxc.c to bpf_lxc.o > > >>>>>>>>>>> differently and added " [21] .rodata.cst32 PROGBITS > > >>>>>>>>>>> 0000000000000000 00011e68" when following macro exceeded 2 > > >>>>>>>>>>> members > > >>>>>>>>>>> > > >>>>>>>>>>> #define VTEP_ENDPOINT (__u32[]){0xec48a90a, 0xee48a90a, > > >>>>>>>>>>> 0x1f48a90a, > > >>>>>>>>>>> 0x2048a90a, } > > >>>>>>>>>>> > > >>>>>>>>>>> no ".rodata.cst32" compiled in bpf_lxc.o when above > > >>>>>>>>>>> VTEP_ENDPOINT > > >>>>>>>>>>> member <=2. any reason why compiler would do that? > > >>>>>>>>>> > > >>>>>>>>>> Regarding to why compiler generates .rodata.cst32, the reason is > > >>>>>>>>>> you have some 32-byte constants which needs to be saved > > >>>>>>>>>> somewhere. > > >>>>>>>>>> For example, > > >>>>>>>>>> > > >>>>>>>>>> $ cat t.c > > >>>>>>>>>> struct t { > > >>>>>>>>>> long c[2]; > > >>>>>>>>>> int d[4]; > > >>>>>>>>>> }; > > >>>>>>>>>> struct t g; > > >>>>>>>>>> int test() > > >>>>>>>>>> { > > >>>>>>>>>> struct t tmp = {.c = {1, 2}, .d = {3, 4}}; > > >>>>>>>>>> g = tmp; > > >>>>>>>>>> return 0; > > >>>>>>>>>> } > > >>>>>>>>>> > > >>>>>>>>>> $ clang -target bpf -O2 -c t.c > > >>>>>>>>>> $ llvm-readelf -S t.o > > >>>>>>>>>> ... > > >>>>>>>>>> [ 4] .rodata.cst32 PROGBITS 0000000000000000 > > >>>>>>>>>> 0000a8 000020 > > >>>>>>>>>> 20 AM 0 0 8 > > >>>>>>>>>> ... > > >>>>>>>>>> > > >>>>>>>>>> In the above code, if you change the struct size, say from 32 > > >>>>>>>>>> bytes to > > >>>>>>>>>> 40 bytes, the rodata.cst32 will go away. > > >>>>>>>>> > > >>>>>>>>> Thanks Yonghong! I guess it is cilium/ebpf needs to recognize > > >>>>>>>>> rodata.cst32 then > > >>>>>>>> > > >>>>>>>> Hi Yonghong, > > >>>>>>>> > > >>>>>>>> Here is a follow-up question, it looks cilium/ebpf parse vmlinux > > >>>>>>>> and > > >>>>>>>> stores BTF type info in btf.Spec.namedTypes, but the elf object > > >>>>>>>> file > > >>>>>>>> provided by user may have section like rodata.cst32 generated by > > >>>>>>>> compiler that does not have accompanying BTF type info stored in > > >>>>>>>> btf.Spec.NamedTypes for the rodata.cst32, how vmlinux can be > > >>>>>>>> guaranteed to have every BTF type info from application/user > > >>>>>>>> provided > > >>>>>>>> elf object file ? I guess there is no guarantee. > > >>>>>>> > > >>>>>>> vmlinux holds kernel types. rodata.cst32 holds data. If the type of > > >>>>>>> rodata.cst32 needs to be emitted, the type will be encoded in bpf > > >>>>>>> program BTF. > > >>>>>>> > > >>>>>>> Did you actually find an issue with .rodata.cst32 section? Such a > > >>>>>>> section is typically generated by the compiler for initial data > > >>>>>>> inside the function and llvm bpf backend tries to inline the > > >>>>>>> values through a bunch of load instructions. So even you see > > >>>>>>> .rodata.cst32, typically you can safely ignore it. > > >>>>>>> > > >>>>>>>> > > >>>>>>>> Vincent > > >>>>>>> > > >>>>>> > > >>>>>> Hi Yonghong, > > >>>>>> > > >>>>>> Thanks for the reproducer. Couldn't figure out what to do with > > >>>>>> .rodata.cst32, > > >>>>>> since there are no symbols and no BTF info for that section. > > >>>>>> > > >>>>>> The values found in .rodata.cst32 are indeed inlined in the > > >>>>>> bytecode as you > > >>>>>> mentioned, so it seems like we can ignore it. > > >>>>>> > > >>>>>> Why does the compiler emit these sections? cilium/ebpf assumed up > > >>>>>> until now > > >>>>>> that all sections starting with '.rodata' are datasecs and must be > > >>>>>> loaded into > > >>>>>> the kernel, which of course needs accompanying BTF. > > >>>>> > > >>>>> The clang frontend emits these .rodata.* sections. In early days, > > >>>>> kernel > > >>>>> doesn't support global data so llvm bpf backend implements an > > >>>>> optimization to inline these values. But llvm bpf backend didn't > > >>>>> completely remove them as the backend doesn't have a global view > > >>>>> whether these .rodata.* are being used in other places or not. > > >>>>> > > >>>>> Now, llvm bpf backend has better infrastructure and we probably can > > >>>>> implement an IR pass to detect all uses of .rodata.*, inline these > > >>>>> uses, and remove the .rodata.* global variable. > > >>>>> > > >>>>> You can check relocation section of the program text. If the .rodata.* > > >>>>> section is referenced, you should preserve it. Otherwise, you can > > >>>>> ignore that .rodata.* section. > > >>>>> > > >>>>>> > > >>>>>> What other .rodata.* should we expect? > > >>>>> > > >>>>> Glancing through llvm code, you may see .rodata.{4,8,16,32}, > > >>>>> .rodata.str*. > > >>>>> > > >>>>>> > > >>>>>> Thanks, > > >>>>>> > > >>>>>> Timo > > >>>> > > >>>> Thanks for the replies all, very insightful. We were already doing > > >>>> things mostly > > >>>> right wrt. .rodata.*, but found a few subtle bugs walking through > > >>>> the code again. > > >>>> > > >>>> I've gotten a hold of the ELF Vincent was trying to load, and I saw > > >>>> a few things > > >>>> that I found unusual. In his case, the values in cst32 are not > > >>>> inlined. Instead, > > >>>> this ELF has a .Lconstinit symbol pointing at the start of > > >>>> .rodata.cst32, and it's > > >>>> an STT_OBJECT with STB_LOCAL. Our relocation handler is fairly > > >>>> strict and requires > > >>>> STT_OBJECTs to be global (for supporting non-static consts). > > >>> > > >>> There are two ways to resolve the issue. First, extend the loader > > >>> support to handle STB_LOCAL as well. Or Second, change the code like > > >>> struct t v = {1, 5, 29, ...}; > > >>> to > > >>> struct t v; > > >>> __builtin_memset(&v, 0, sizeof(struct t)); > > >>> v.field1 = ...; > > >>> v.field2 = ...; > > >>> > > >>> > > >>>> > > >>>> --- > > >>>> ~ llvm-readelf -ar bpf_lxc.o > > >>>> > > >>>> Symbol table '.symtab' contains 606 entries: > > >>>> Num: Value Size Type Bind Vis Ndx Name > > >>>> 2: 0000000000000000 32 OBJECT LOCAL DEFAULT 21 > > >>>> .Lconstinit > > >>>> > > >>>> Relocation section '.rel2/7' at offset 0x6bdf0 contains 173 entries: > > >>>> Offset Info Type > > >>>> Symbol's Value Symbol's Name > > >>>> 0000000000007300 0000000200000001 R_BPF_64_64 > > >>>> 0000000000000000 .Lconstinit > > >>>> --- > > >>>> > > >>>> --- > > >>>> ~ llvm-objdump -S -r -j 2/7 -j .rodata.cst32 bpf_lxc.o > > >>>> warning: failed to compute relocation: R_BPF_64_64, Invalid data was > > >>>> encountered while parsing the file > > >>>> ... <2 more of these> ... > > >>>> > > >>>> Disassembly of section 2/7: > > >>>> > > >>>> 00000000000072f8 <LBB1_476>: > > >>>> 3679: 67 08 00 00 03 00 00 00 r8 <<= 3 > > >>>> 3680: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 > > >>>> = 0 ll > > >>>> 0000000000007300: R_BPF_64_64 .Lconstinit > > >>>> 3682: 0f 82 00 00 00 00 00 00 r2 += r8 > > >>>> 3683: 79 22 00 00 00 00 00 00 r2 = *(u64 *)(r2 + 0) > > >>>> 3684: 7b 2a 58 ff 00 00 00 00 *(u64 *)(r10 - 168) = r2 > > >>>> > > >>>> Disassembly of section .rodata.cst32: > > >>>> > > >>>> 0000000000000000 <.Lconstinit>: > > >>>> 0: 82 36 4c 98 2e 56 00 00 <unknown> > > >>>> 1: 82 36 4c 98 2e 55 00 00 <unknown> > > >>>> --- > > >>>> > > >>>> > > >>>> This symbol doesn't exist in the program. Worth noting is that the > > >>>> code that accesses > > >>>> this static data sits within a subscope, but not sure what the > > >>>> effect of this would be. > > >>>> > > >>>> Vincent, maybe try removing the enclosing {} to see if that changes > > >>>> anything? > > >>>> > > >>>> --- > > >>>> static __always_inline int foo(struct __ctx_buff *ctx, > > >>>> > > >>>> ... <snip> ... > > >>>> > > >>>> { > > >>>> int i; > > >>>> > > >>>> for (i = 0; i < VTEP_NUMS; i) { > > >>>> if (tunnel_endpoint == VTEP_ENDPOINT[i]) { > > >>>> vtep_mac = VTEP_MAC[i]; > > >>>> break; > > >>>> } > > >>>> } > > >>>> } > > >>>> --- > > >>>> > > >>>> Is this perhaps something that needs to be addressed in the compiler? > > >>> > > >>> If you can give a reproducible test (with .c or .i file), I can take a > > >>> look at what is missing in llvm compiler and improve it. > > >>> > > >> > > >> not sure if it would help, here is my step to generate the bpf_lxc.o > > >> object file with the .rodata.cst32 > > >> > > >> git clone https://github.com/f5devcentral/cilium.git > > >> cd cilium; git checkout vli-vxlan; KERNEL=54 make -C bpf > > >> llvm-objdump -S -r -j 2/7 -j .rodata.cst32 bpf/bpf_lxc.o > > > > > > Thanks. I can reproduce the issue now. Will take a look > > > and get back to you as soon as I got any concrete results. > > > > Okay, I found the reason. > > > > For the code, > > > > for (i = 0; i < VTEP_NUMS; i++) { > > if (tunnel_endpoint == VTEP_ENDPOINT[i]) { > > vtep_mac = VTEP_MAC[i]; > > break; > > } > > } > > > > The compiler transformed to something like > > > > i = 0; if (tunnerl_endpoint == VTEP_ENDPOINT[0]) goto end; > > i = 1; if (tunnerl_endpoint == VTEP_ENDPOINT[1]) goto end; > > i = 2; if (tunnerl_endpoint == VTEP_ENDPOINT[2]) goto end; > > i = 3; if (tunnerl_endpoint == VTEP_ENDPOINT[3]) goto end; > > > > end: > > vtep_mac = VTEP_MAC[i]; > > > > The compiler cannot inline VTEP_MAC[i] since 'i' is not > > a constant. Hence later we have a memory load from > > a non-global .rodata section. > > > > As I mentioned earlier, there are two options to fix the issue. > > First is for cilium to track and handle non-global .rodata > > sections. And the second you can apply the below code change, > > > > diff --git a/bpf/node_config.h b/bpf/node_config.h > > index 9783e44548..b80dd2b27b 100644 > > --- a/bpf/node_config.h > > +++ b/bpf/node_config.h > > @@ -176,15 +176,15 @@ DEFINE_IPV6(HOST_IP, 0xbe, 0xef, 0x0, 0x0, 0x0, > > 0x0, 0x0, 0x1, 0x0, 0x0, 0xa, 0x > > #endif > > > > #ifdef ENABLE_VTEP > > -#define VTEP_ENDPOINT (__u32[]){0xeb48a90a, 0xec48a90a, 0xed48a90a, > > 0xee48a90a, } > > +#define VTEP_NUMS 4 > > +__u32 VTEP_ENDPOINT[VTEP_NUMS] = {0xeb48a90a, 0xec48a90a, 0xed48a90a, > > 0xee48a90a}; > > /* HEX representation of VTEP IP > > * 10.169.72.235, 10.169.72.236, 10.169.72.237, 10.169.72.238 > > */ > > -#define VTEP_MAC (__u64[]){0x562e984c3682, 0x552e984c3682, > > 0x542e984c3682, 0x532e984c3682} > > +__u64 VTEP_MAC[VTEP_NUMS] = {0x562e984c3682, 0x552e984c3682, > > 0x542e984c3682, 0x532e984c3682}; > > /* VTEP MAC address > > * 82:36:4c:89:2e:56, 82:36:4c:89:2e:55, 82:36:4c:89:2e:54, > > 82:36:4c:89:2e:53 > > */ > > -#define VTEP_NUMS 4 > > #endif > > > I may misunderstand you, I thought your suggestion would stop compiler generating .rodata.cst32, but it appears the compiler still generated .rodata.cst32 after applying your changes readelf -e bpf/bpf_lxc.o | grep 'rodata' [51] .rodata.cst32 PROGBITS 0000000000000000 00045f48 > Thank you Yonghong for the suggestion, the original code is kind of > hack from me to work around some issue :). now we decided to abandon > above code and use BPF hash map for VTEP lookup in Cilium to avoid > this issue and to be more flexible, so it is up to cilium/ebpf to > decide to address the rodata or not. > > > /* It appears that we can support around the below number of prefixes > > in an > > > > > > > >> > > >>>> > > >>>> > > >>>> Thanks again, > > >>>> > > >>>> Timo