Re: [PATCH v3 04/12] module: Add printk format to add module build ID to stacktraces

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue 2021-03-30 20:05:12, Stephen Boyd wrote:
> Let's make kernel stacktraces easier to identify by including the build
> ID[1] of a module if the stacktrace is printing a symbol from a module.
> This makes it simpler for developers to locate a kernel module's full
> debuginfo for a particular stacktrace. Combined with
> scripts/decode_stracktrace.sh, a developer can download the matching
> debuginfo from a debuginfod[2] server and find the exact file and line
> number for the functions plus offsets in a stacktrace that match the
> module. This is especially useful for pstore crash debugging where the
> kernel crashes are recorded in something like console-ramoops and the
> recovery kernel/modules are different or the debuginfo doesn't exist on
> the device due to space concerns (the debuginfo can be too large for
> space limited devices).
> 
> Originally, I put this on the %pS format, but that was quickly rejected
> given that %pS is used in other places such as ftrace where build IDs
> aren't meaningful. There was some discussions on the list to put every
> module build ID into the "Modules linked in:" section of the stacktrace
> message but that quickly becomes very hard to read once you have more
> than three or four modules linked in. It also provides too much
> information when we don't expect each module to be traversed in a
> stacktrace. Having the build ID for modules that aren't important just
> makes things messy. Splitting it to multiple lines for each module
> quickly explodes the number of lines printed in an oops too, possibly
> wrapping the warning off the console. And finally, trying to stash away
> each module used in a callstack to provide the ID of each symbol printed
> is cumbersome and would require changes to each architecture to stash
> away modules and return their build IDs once unwinding has completed.
> 
> Instead, we opt for the simpler approach of introducing a new printk
> format '%pS[R]b' for "pointer symbolic backtrace with module build ID"
> and then updating the few places in the architecture layer where the
> stacktrace is printed to use this new format.
> 
> Example:
> 
>  WARNING: CPU: 3 PID: 3373 at drivers/misc/lkdtm/bugs.c:83 lkdtm_WARNING+0x28/0x30 [lkdtm]
>  Modules linked in: lkdtm rfcomm algif_hash algif_skcipher af_alg xt_cgroup uinput xt_MASQUERADE hci_uart <modules trimmed>
>  CPU: 3 PID: 3373 Comm: bash Not tainted 5.11 #12 a8c0d47f7051f3e6670ceaea724af66a39c6cec8
>  Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
>  pstate: 00400009 (nzcv daif +PAN -UAO -TCO BTYPE=--)
>  pc : lkdtm_WARNING+0x28/0x30 [lkdtm]
>  lr : lkdtm_do_action+0x24/0x40 [lkdtm]
>  sp : ffffffc013febca0
>  x29: ffffffc013febca0 x28: ffffff88d9438040
>  x27: 0000000000000000 x26: 0000000000000000
>  x25: 0000000000000000 x24: ffffffdd0e9772c0
>  x23: 0000000000000020 x22: ffffffdd0e975366
>  x21: ffffffdd0e9772e0 x20: ffffffc013febde0
>  x19: 0000000000000008 x18: 0000000000000000
>  x17: 0000000000000000 x16: 0000000000000037
>  x15: ffffffdd102ab174 x14: 0000000000000003
>  x13: 0000000000000004 x12: 0000000000000000
>  x11: 0000000000000000 x10: 0000000000000000
>  x9 : 0000000000000001 x8 : ffffffdd0e979000
>  x7 : 0000000000000000 x6 : ffffffdd10ff6b54
>  x5 : 0000000000000000 x4 : 0000000000000000
>  x3 : ffffffc013feb938 x2 : ffffff89fef05a70
>  x1 : ffffff89feef5788 x0 : ffffffdd0e9772e0
>  Call trace:
>   lkdtm_WARNING+0x28/0x30 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9]
>   direct_entry+0x16c/0x1b4 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9]
>   full_proxy_write+0x74/0xa4
>   vfs_write+0xec/0x2e8
>   ksys_write+0x84/0xf0
>   __arm64_sys_write+0x24/0x30
>   el0_svc_common+0xf4/0x1c0
>   do_el0_svc_compat+0x28/0x3c
>   el0_svc_compat+0x10/0x1c
>   el0_sync_compat_handler+0xa8/0xcc
>   el0_sync_compat+0x178/0x180
>  ---[ end trace f89bc7f5417cbcc6 ]---

I used this test patch:

diff --git a/lib/test_printf.c b/lib/test_printf.c
index 95a2f82427c7..06716948de3c 100644
--- a/lib/test_printf.c
+++ b/lib/test_printf.c
@@ -686,6 +686,7 @@ test_pointer(void)
        kernel_ptr();
        struct_resource();
        addr();
+       BUG();
        escaped_str();
        hex_string();
        mac();

Then I did

  # modprobe test_printf

and got the following:

[  125.707281][ T1837] test_printf: loaded.
[  125.707707][ T1837] ------------[ cut here ]------------
[  125.708227][ T1837] kernel BUG at lib/test_printf.c:689!
[  125.709102][ T1837] invalid opcode: 0000 [#1] SMP NOPTI
[  125.709111][ T1837] CPU: 3 PID: 1837 Comm: modprobe Kdump: loaded Tainted: G            E     5.12.0-rc6-default+ #171 00000080ffffffff000000000000000000000000
[  125.712102][ T1837] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[  125.712191][ T1837] RIP: 0010:test_printf_init+0x561/0xc99 [test_printf c2388ff0552611501b4d2ad58d8e5ca441d9a350]
[  125.712233][ T1837] Code: 00 48 c7 c7 b8 96 0f c0 e8 19 f9 ff ff b9 ab 00 00 00 48 c7 c2 93 96 0f c0 be 08 00 00 00 48 c7 c7 af 96 0f c0 e8 fc f8 ff ff <0f> 0b 8b 05 44 07 00 00 8b 35 3a 07 00 00 8b 1d 3c 07 00 00 85 c0
[  125.712235][ T1837] RSP: 0018:ffffb060004abc78 EFLAGS: 00010282
[  125.712240][ T1837] RAX: 0000000000000000 RBX: ffffb060004abc80 RCX: ffffd05fffc00b70
[  125.712241][ T1837] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffad352fd5
[  125.712242][ T1837] RBP: ffffffffc00fd367 R08: 0000000000000001 R09: 0000000000000001
[  125.712243][ T1837] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b6cc49ffc00
[  125.712244][ T1837] R13: 0000000000000001 R14: ffff9b6cc2ee0000 R15: ffffb060004abe90
[  125.712247][ T1837] FS:  00007f70c4466b80(0000) GS:ffff9b6d3fc00000(0000) knlGS:0000000000000000
[  125.712248][ T1837] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  125.712249][ T1837] CR2: 00007ffdb02e8b38 CR3: 000000010b6f2002 CR4: 0000000000370ee0
[  125.712257][ T1837] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  125.712257][ T1837] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  125.712258][ T1837] Call Trace:
[  125.712275][ T1837]  ? __test+0x13c/0x149 [test_printf]
[  125.723404][ T1837]  ? rcu_read_lock_sched_held+0x52/0x80
[  125.723425][ T1837]  do_one_initcall+0x5b/0x2d0
[  125.724367][ T1837]  do_init_module+0x5b/0x21c
[  125.724377][ T1837]  load_module+0x1eaa/0x23c0
[  125.725381][ T1837]  ? show_modinfo_version+0x30/0x30
[  125.725422][ T1837]  ? __do_sys_finit_module+0xad/0x110
[  125.725425][ T1837]  __do_sys_finit_module+0xad/0x110
[  125.725431][ T1837]  do_syscall_64+0x33/0x40
[  125.725464][ T1837]  entry_SYSCALL_64_after_hwframe+0x44/0xae

It shows wrong build id for vmlinux.
And it does not show the build if for the module at all.


readelf shows the following:

# readelf -Wn /lib/modules/5.12.0-rc6-default+/kernel/lib/test_printf.ko

Displaying notes found in: .note.gnu.build-id
  Owner                 Data size       Description
  GNU                  0x00000014       NT_GNU_BUILD_ID (unique build ID bitstring)         Build ID: c2388ff0552611501b4d2ad58d8e5ca441d9a350


Best Regards,
Petr



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux