On 14/12/2024 04:41, Cong Wang wrote: > On Thu, Dec 05, 2024 at 08:36:33AM +0100, Uros Bizjak wrote: >> On Wed, Dec 4, 2024 at 4:52 PM Laura Nao <laura.nao@xxxxxxxxxxxxx> wrote: >>> >>> On 11/15/24 18:17, Laura Nao wrote: >>>> I managed to reproduce the issue locally and I've uploaded the vmlinux[1] >>>> (stripped of DWARF data) and vmlinux.raw[2] files, as well as one of the >>>> modules[3] and its btf data[4] extracted with: >>>> >>>> bpftool -B vmlinux btf dump file cros_kbd_led_backlight.ko > cros_kbd_led_backlight.ko.raw >>>> >>>> Looking again at the logs[5], I've noticed the following is reported: >>>> >>>> [ 0.415885] BPF: type_id=115803 offset=177920 size=1152 >>>> [ 0.416029] BPF: >>>> [ 0.416083] BPF: Invalid offset >>>> [ 0.416165] BPF: >>>> >>>> There are two different definitions of rcu_data in '.data..percpu', one >>>> is a struct and the other is an integer: >>>> >>>> type_id=115801 offset=177920 size=1152 (VAR 'rcu_data') >>>> type_id=115803 offset=177920 size=1152 (VAR 'rcu_data') >>>> >>>> [115801] VAR 'rcu_data' type_id=115572, linkage=static >>>> [115803] VAR 'rcu_data' type_id=1, linkage=static >>>> >>>> [115572] STRUCT 'rcu_data' size=1152 vlen=69 >>>> [1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none) >>>> >>>> I assume that's not expected, correct? >>>> >>>> I'll dig a bit deeper and report back if I can find anything else. >>> >>> I ran a bisection, and it appears the culprit commit is: >>> https://lore.kernel.org/all/20241021080856.48746-2-ubizjak@xxxxxxxxx/ >>> >>> Hi Uros, do you have any suggestions or insights on resolving this issue? >> >> There is a stray ";" at the end of the #define, perhaps this makes a difference: >> >> +#define PERCPU_PTR(__p) \ >> + (typeof(*(__p)) __force __kernel *)(__p); >> + >> >> and SHIFT_PERCPU_PTR macro now expands to: >> >> RELOC_HIDE((typeof(*(p)) __force __kernel *)(p);, (offset)) >> >> A follow-up patch in the series changes PERCPU_PTR macro to: >> >> #define PERCPU_PTR(__p) \ >> ({ \ >> unsigned long __pcpu_ptr = (__force unsigned long)(__p); \ >> (typeof(*(__p)) __force __kernel *)(__pcpu_ptr); \ >> }) >> >> so this should again correctly cast the value. > > Hm, I saw a similar bug but with pahole 1.28. My kernel complains about > BTF invalid offset: > > [ 7.785788] BPF: type_id=2394 offset=0 size=1 > [ 7.786411] BPF: > [ 7.786703] BPF: Invalid offset > [ 7.787119] BPF: > > Dumping the vmlinux (there is no module invovled), I saw it is related to > percpu pointer too: > > [2394] VAR '__pcpu_unique_cpu_hw_events' type_id=2, linkage=global > ... > [163643] DATASEC '.data..percpu' size=2123280 vlen=808 > type_id=2393 offset=0 size=1 (VAR '__pcpu_scope_cpu_hw_events') > type_id=2394 offset=0 size=1 (VAR '__pcpu_unique_cpu_hw_events') > ... > > I compiled and installed the latest pahole from its git repo: > > $ pahole --version > v1.28 > > Thanks. Thanks for the report! Looking at percpu-defs.h it looks like the existence of such variables requires either #if defined(ARCH_NEEDS_WEAK_PER_CPU) || defined(CONFIG_DEBUG_FORCE_WEAK_PER_CPU) ... #define DEFINE_PER_CPU_SECTION(type, name, sec) \ __PCPU_DUMMY_ATTRS char __pcpu_scope_##name; \ extern __PCPU_DUMMY_ATTRS char __pcpu_unique_##name; \ __PCPU_DUMMY_ATTRS char __pcpu_unique_##name; \ extern __PCPU_ATTRS(sec) __typeof__(type) name; \ __PCPU_ATTRS(sec) __weak __typeof__(type) name I'm guessing your .config has CONFIG_DEBUG_FORCE_WEAK_PER_CPU, or are you building on s390/alpha? I've reproduced this on bpf-next with CONFIG_DEBUG_FORCE_WEAK_PER_CPU=y, pahole v1.28 and gcc-12; I see ~900 __pcpu_ variables and get the same BTF errors since multipe __pcpu_ vars share the offset 0. A simple workaround in dwarves - and I verified this resolved the issue for me - would be diff --git a/btf_encoder.c b/btf_encoder.c index 3754884..4a1799a 100644 --- a/btf_encoder.c +++ b/btf_encoder.c @@ -2174,7 +2174,8 @@ static bool filter_variable_name(const char *name) X("__UNIQUE_ID"), X("__tpstrtab_"), X("__exitcall_"), - X("__func_stack_frame_non_standard_") + X("__func_stack_frame_non_standard_"), + X("__pcpu_") #undef X }; int i; ...but I'd like us to understand further why variables which were supposed to be in a .discard section end up being encoded as there may be other problems lurking here aside from this one. More soon hopefully... Alan