Re: [REGRESSION] module BTF validation failure (Error -22) on next

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/11/2024 16:08, Laura Nao wrote:
> Hello,
> 
> KernelCI has detected a module loading regression affecting all AMD and 
> Intel Chromebooks in the Collabora LAVA lab, occurring between 
> next-20241024 and next-20241025.
> 
> The logs indicate a failure in BTF module validation, preventing all 
> modules from loading correctly (with CONFIG_MODULE_ALLOW_BTF_MISMATCH 
> unset). The example below is from an AMD Chromebook (HP 14b na0052xx), 
> with similar errors observed on other AMD and Intel devices:
> 
> [    5.284373] failed to validate module [cros_kbd_led_backlight] BTF: -22
> [    5.291392] failed to validate module [i2c_hid] BTF: -22
> [    5.293958] failed to validate module [chromeos_pstore] BTF: -22
> [    5.302832] failed to validate module [coreboot_table] BTF: -22
> [    5.309175] failed to validate module [raydium_i2c_ts] BTF: -22
> [    5.309264] failed to validate module [i2c_cros_ec_tunnel] BTF: -22
> [    5.322158] failed to validate module [typec] BTF: -22
> [    5.327554] failed to validate module [snd_timer] BTF: -22
> [    5.327573] failed to validate module [cros_usbpd_notify] BTF: -22
> [    5.339272] failed to validate module [elan_i2c] BTF: -22
> [    5.345821] failed to validate module [industrialio] BTF: -22
> [    5.423113] failed to validate module [cfg80211] BTF: -22
> [    5.443074] failed to validate module [cros_ec_dev] BTF: -22
> [    5.448857] failed to validate module [snd_pci_acp3x] BTF: -22
> [    5.454736] failed to validate module [cros_kbd_led_backlight] BTF: -22
> [    5.461458] failed to validate module [regmap_i2c] BTF: -22
> [    5.470228] failed to validate module [i2c_piix4] BTF: -22
> [    5.491123] failed to validate module [i2c_hid] BTF: -22
> [    5.491226] failed to validate module [chromeos_pstore] BTF: -22
> [    5.496519] failed to validate module [coreboot_table] BTF: -22
> [    5.502632] failed to validate module [snd_timer] BTF: -22
> [    5.538916] failed to validate module [gsmi] BTF: -22
> [    5.604971] failed to validate module [mii] BTF: -22
> [    5.604971] failed to validate module [videobuf2_common] BTF: -22
> [    5.604972] failed to validate module [sp5100_tco] BTF: -22
> [    5.616068] failed to validate module [snd_soc_acpi] BTF: -22
> [    5.680553] failed to validate module [bluetooth] BTF: -22
> [    5.749320] failed to validate module [chromeos_pstore] BTF: -22
> [    5.755440] failed to validate module [mii] BTF: -22
> [    5.760522] failed to validate module [snd_timer] BTF: -22
> [    5.783549] failed to validate module [bluetooth] BTF: -22
> [    5.841561] failed to validate module [mii] BTF: -22
> [    5.846699] failed to validate module [snd_timer] BTF: -22
> [    5.892444] failed to validate module [mii] BTF: -22
> [    5.897708] failed to validate module [snd_timer] BTF: -22
> [    5.945507] failed to validate module [snd_timer] BTF: -22
> 
> The full kernel log is available on [1]. The config used is available on
> [2] and the kernel/modules have been built using gcc-12.
> 
> The issue is still present on next-20241105.
> 
> I'm sending this report to track the regression while a fix is
> identified. The culprit commit hasn't been pinpointed yet, I'll report
> back once it's identified.
> 
> Any feedback or suggestion for additional debugging steps would be greatly 
> appreciated.
> 
> Best,
>

Thanks for the report! Judging from the config, you're seeing this with
pahole v1.24. I have seen issues like this in the past where during a
kernel build, module BTF has been built against vmlinux BTF, and then
something later re-triggers vmlinux BTF generation. If that re-triggered
vmlinux BTF does not use the same type ids for types, this can result in
mismatch errors as above since modules are referring to out-of-date type
ids in vmlinux. That's just a preliminary guess though, we'll
need more info to help get to the bottom of this.

A few suggestions to help debug this:

- if you have build logs, check BTF generation of vmlinux. Did it in
fact happen twice perhaps? Even better if, if kernel CI saves logs, feel
free to send a pointer and I'll take a look.
- can you post the vmlinux (stripped of DWARF data if possible to limit
size) and one of the failing modules somewhere so we can analyze?
- Failing that,
bpftool btf dump file /path/2/vmlinux_from_build > vmlinux.raw
and upload of the vmlinux.raw and one of the failing module .kos would help.

I've tried to reproduce this; no luck so far at my end.

Alan

> Laura
> 
> [1] https://pastebin.com/raw/dtvzBkxh
> [2] https://pastebin.com/raw/a1MGi3wH
> 
> #regzbot introduced: next-20241024..next-20241025
> 
> 





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux