On Mon, Dec 16, 2024 at 03:19:01PM +0000, Alan Maguire wrote: > On 14/12/2024 12:15, Alan Maguire wrote: > > On 14/12/2024 04:41, Cong Wang wrote: > >> On Thu, Dec 05, 2024 at 08:36:33AM +0100, Uros Bizjak wrote: > >>> On Wed, Dec 4, 2024 at 4:52 PM Laura Nao <laura.nao@xxxxxxxxxxxxx> wrote: > >>>> > >>>> On 11/15/24 18:17, Laura Nao wrote: > >>>>> I managed to reproduce the issue locally and I've uploaded the vmlinux[1] > >>>>> (stripped of DWARF data) and vmlinux.raw[2] files, as well as one of the > >>>>> modules[3] and its btf data[4] extracted with: > >>>>> > >>>>> bpftool -B vmlinux btf dump file cros_kbd_led_backlight.ko > cros_kbd_led_backlight.ko.raw > >>>>> > >>>>> Looking again at the logs[5], I've noticed the following is reported: > >>>>> > >>>>> [ 0.415885] BPF: type_id=115803 offset=177920 size=1152 > >>>>> [ 0.416029] BPF: > >>>>> [ 0.416083] BPF: Invalid offset > >>>>> [ 0.416165] BPF: > >>>>> > >>>>> There are two different definitions of rcu_data in '.data..percpu', one > >>>>> is a struct and the other is an integer: > >>>>> > >>>>> type_id=115801 offset=177920 size=1152 (VAR 'rcu_data') > >>>>> type_id=115803 offset=177920 size=1152 (VAR 'rcu_data') > >>>>> > >>>>> [115801] VAR 'rcu_data' type_id=115572, linkage=static > >>>>> [115803] VAR 'rcu_data' type_id=1, linkage=static > >>>>> > >>>>> [115572] STRUCT 'rcu_data' size=1152 vlen=69 > >>>>> [1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none) > >>>>> > >>>>> I assume that's not expected, correct? > >>>>> > >>>>> I'll dig a bit deeper and report back if I can find anything else. > >>>> > >>>> I ran a bisection, and it appears the culprit commit is: > >>>> https://lore.kernel.org/all/20241021080856.48746-2-ubizjak@xxxxxxxxx/ > >>>> > >>>> Hi Uros, do you have any suggestions or insights on resolving this issue? > >>> > >>> There is a stray ";" at the end of the #define, perhaps this makes a difference: > >>> > >>> +#define PERCPU_PTR(__p) \ > >>> + (typeof(*(__p)) __force __kernel *)(__p); > >>> + > >>> > >>> and SHIFT_PERCPU_PTR macro now expands to: > >>> > >>> RELOC_HIDE((typeof(*(p)) __force __kernel *)(p);, (offset)) > >>> > >>> A follow-up patch in the series changes PERCPU_PTR macro to: > >>> > >>> #define PERCPU_PTR(__p) \ > >>> ({ \ > >>> unsigned long __pcpu_ptr = (__force unsigned long)(__p); \ > >>> (typeof(*(__p)) __force __kernel *)(__pcpu_ptr); \ > >>> }) > >>> > >>> so this should again correctly cast the value. > >> > >> Hm, I saw a similar bug but with pahole 1.28. My kernel complains about > >> BTF invalid offset: > >> > >> [ 7.785788] BPF: type_id=2394 offset=0 size=1 > >> [ 7.786411] BPF: > >> [ 7.786703] BPF: Invalid offset > >> [ 7.787119] BPF: > >> > >> Dumping the vmlinux (there is no module invovled), I saw it is related to > >> percpu pointer too: > >> > >> [2394] VAR '__pcpu_unique_cpu_hw_events' type_id=2, linkage=global > >> ... > >> [163643] DATASEC '.data..percpu' size=2123280 vlen=808 > >> type_id=2393 offset=0 size=1 (VAR '__pcpu_scope_cpu_hw_events') > >> type_id=2394 offset=0 size=1 (VAR '__pcpu_unique_cpu_hw_events') > >> ... > >> > >> I compiled and installed the latest pahole from its git repo: > >> > >> $ pahole --version > >> v1.28 > >> > >> Thanks. > > > > Thanks for the report! Looking at percpu-defs.h it looks like the > > existence of such variables requires either > > > > #if defined(ARCH_NEEDS_WEAK_PER_CPU) || > > defined(CONFIG_DEBUG_FORCE_WEAK_PER_CPU) > > > > ... > > > > #define DEFINE_PER_CPU_SECTION(type, name, sec) \ > > __PCPU_DUMMY_ATTRS char __pcpu_scope_##name; \ > > extern __PCPU_DUMMY_ATTRS char __pcpu_unique_##name; \ > > __PCPU_DUMMY_ATTRS char __pcpu_unique_##name; \ > > extern __PCPU_ATTRS(sec) __typeof__(type) name; \ > > __PCPU_ATTRS(sec) __weak __typeof__(type) name > > > > > > I'm guessing your .config has CONFIG_DEBUG_FORCE_WEAK_PER_CPU, or are > > you building on s390/alpha? > > > > I've reproduced this on bpf-next with CONFIG_DEBUG_FORCE_WEAK_PER_CPU=y, > > pahole v1.28 and gcc-12; I see ~900 __pcpu_ variables and get the same > > BTF errors since multipe __pcpu_ vars share the offset 0. > > > > A simple workaround in dwarves - and I verified this resolved the issue > > for me - would be > > > > diff --git a/btf_encoder.c b/btf_encoder.c > > index 3754884..4a1799a 100644 > > --- a/btf_encoder.c > > +++ b/btf_encoder.c > > @@ -2174,7 +2174,8 @@ static bool filter_variable_name(const char *name) > > X("__UNIQUE_ID"), > > X("__tpstrtab_"), > > X("__exitcall_"), > > - X("__func_stack_frame_non_standard_") > > + X("__func_stack_frame_non_standard_"), > > + X("__pcpu_") > > #undef X > > }; > > int i; > > > > ...but I'd like us to understand further why variables which were > > supposed to be in a .discard section end up being encoded as there may > > be other problems lurking here aside from this one. More soon hopefully... > > > > > A bit more context here - variable encoding takes the address of the > variable from DWARF to locate the associated ELF section. Because we > insist on having a variable specification - with a location - this > usually works fine. However the problem is that because these dummy > __pcpu_ variables specify a .discard section, their addresses are 0, so > we get for example: > > <1><1e535>: Abbrev Number: 114 (DW_TAG_variable) > <1e536> DW_AT_name : (indirect string, offset: 0x5e97): > __pcpu_unique_kstack_offset > <1e53a> DW_AT_decl_file : 1 > <1e53b> DW_AT_decl_line : 823 > <1e53d> DW_AT_decl_column : 1 > <1e53e> DW_AT_type : <0x57> > <1e542> DW_AT_external : 1 > <1e542> DW_AT_declaration : 1 > <1><1e542>: Abbrev Number: 156 (DW_TAG_variable) > <1e544> DW_AT_specification: <0x1e535> > <1e548> DW_AT_location : 9 byte block: 3 0 0 0 0 0 0 0 0 > (DW_OP_addr: 0) > > > You can see the same thing for a simple program like this: > > #include <stdio.h> > > #define SEC(name) __attribute__((section(name))) > > SEC("/DISCARD/") int d1; > extern int d1; > SEC("/DISCARD/") int d2; > extern int d2; > > int main(int argc, char *argv[]) > { > return 0; > } > > > If you compile it with -g, the DWARF shows that d1 and d2 both have > address 0: > > <1><72>: Abbrev Number: 5 (DW_TAG_variable) > <73> DW_AT_name : d1 > <76> DW_AT_decl_file : 1 > <77> DW_AT_decl_line : 5 > <78> DW_AT_decl_column : 22 > <79> DW_AT_type : <0x57> > <7d> DW_AT_external : 1 > <7d> DW_AT_location : 9 byte block: 3 0 0 0 0 0 0 0 0 > (DW_OP_addr: 0) > <1><87>: Abbrev Number: 5 (DW_TAG_variable) > <88> DW_AT_name : d2 > <8b> DW_AT_decl_file : 1 > <8c> DW_AT_decl_line : 7 > <8d> DW_AT_decl_column : 22 > <8e> DW_AT_type : <0x57> > <92> DW_AT_external : 1 > <92> DW_AT_location : 9 byte block: 3 0 0 0 0 0 0 0 0 > (DW_OP_addr: 0) > > > So the reason this happens for dwarves v1.28 in particular is - as I > understand it - we moved away from recording ELF section information for > each variable and matching that with DWARF info, instead relying on the > address to locate the associated ELF section. In cases like the above > the address information unfortunately leads us astray. > > Seems like there's a few approaches we can take in fixing this: > > 1. designate "__pcpu_" prefix as a variable prefix to filter out. This > resolves the immediate problem but is too narrowly focused IMO and we > may end up playing whack-a-mole with other dummy variable prefixes. > 2. resurrect ELF section variable information fully; i.e. record a list > of variables per ELF section (or at least per ELF section we care > about). If variable is not on the list for the ELF section, do not > encode it. > 3. midway between the two; for the 0 address case specifically, verify > that the variable name really _is_ in the associated ELF section. No > need to create a local ELF table variable representation, we could just > walk the table in the case of the 0 addresses. > > Diff for approach 3 is as follows > > diff --git a/btf_encoder.c b/btf_encoder.c > index 3754884..21a0ab6 100644 > --- a/btf_encoder.c > +++ b/btf_encoder.c > @@ -2189,6 +2189,26 @@ static bool filter_variable_name(const char *name) > return false; > } > > +bool variable_in_sec(struct btf_encoder *encoder, const char *name, > size_t shndx) > +{ > + uint32_t sym_sec_idx; > + uint32_t core_id; > + GElf_Sym sym; > + > + elf_symtab__for_each_symbol_index(encoder->symtab, core_id, sym, > sym_sec_idx) { > + const char *sym_name; > + > + if (sym_sec_idx != shndx || elf_sym__type(&sym) != > STT_OBJECT) > + continue; > + sym_name = elf_sym__name(&sym, encoder->symtab); > + if (!sym_name) > + continue; > + if (strcmp(name, sym_name) == 0) > + return true; > + } > + return false; > +} > + > static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder) > { > struct cu *cu = encoder->cu; > @@ -2258,6 +2278,11 @@ static int > btf_encoder__encode_cu_variables(struct btf_encoder *encoder) > if (filter_variable_name(name)) > continue; > > + /* A 0 address may be in a .discard section; ensure the > + * variable really is in this section by checking ELF > symtab. > + */ > + if (addr == 0 && !variable_in_sec(encoder, name, shndx)) > + continue; > /* Check for invalid BTF names */ > if (!btf_name_valid(name)) { > dump_invalid_symbol("Found invalid variable name > when encoding btf", > > > ...so slightly more complex than option 1, but a bit more general in its > applicability to .discard section variables. > > For the pahole folks, what do we think? Which option (or indeed other > ones I haven't thought of) makes sense for a fix for this? Thanks! I can reproduce this with the CONFIG_DEBUG_FORCE_WEAK_PER_CPU enable, the fix looks fine, could you send formal patch? thanks, jirka