On Tue 2021-03-30 20:05:12, Stephen Boyd wrote: > Let's make kernel stacktraces easier to identify by including the build > ID[1] of a module if the stacktrace is printing a symbol from a module. > This makes it simpler for developers to locate a kernel module's full > debuginfo for a particular stacktrace. Combined with > scripts/decode_stracktrace.sh, a developer can download the matching > debuginfo from a debuginfod[2] server and find the exact file and line > number for the functions plus offsets in a stacktrace that match the > module. This is especially useful for pstore crash debugging where the > kernel crashes are recorded in something like console-ramoops and the > recovery kernel/modules are different or the debuginfo doesn't exist on > the device due to space concerns (the debuginfo can be too large for > space limited devices). > > Originally, I put this on the %pS format, but that was quickly rejected > given that %pS is used in other places such as ftrace where build IDs > aren't meaningful. There was some discussions on the list to put every > module build ID into the "Modules linked in:" section of the stacktrace > message but that quickly becomes very hard to read once you have more > than three or four modules linked in. It also provides too much > information when we don't expect each module to be traversed in a > stacktrace. Having the build ID for modules that aren't important just > makes things messy. Splitting it to multiple lines for each module > quickly explodes the number of lines printed in an oops too, possibly > wrapping the warning off the console. And finally, trying to stash away > each module used in a callstack to provide the ID of each symbol printed > is cumbersome and would require changes to each architecture to stash > away modules and return their build IDs once unwinding has completed. > > Instead, we opt for the simpler approach of introducing a new printk > format '%pS[R]b' for "pointer symbolic backtrace with module build ID" > and then updating the few places in the architecture layer where the > stacktrace is printed to use this new format. > > Example: > > WARNING: CPU: 3 PID: 3373 at drivers/misc/lkdtm/bugs.c:83 lkdtm_WARNING+0x28/0x30 [lkdtm] > Modules linked in: lkdtm rfcomm algif_hash algif_skcipher af_alg xt_cgroup uinput xt_MASQUERADE hci_uart <modules trimmed> > CPU: 3 PID: 3373 Comm: bash Not tainted 5.11 #12 a8c0d47f7051f3e6670ceaea724af66a39c6cec8 > Hardware name: Google Lazor (rev3+) with KB Backlight (DT) > pstate: 00400009 (nzcv daif +PAN -UAO -TCO BTYPE=--) > pc : lkdtm_WARNING+0x28/0x30 [lkdtm] > lr : lkdtm_do_action+0x24/0x40 [lkdtm] > sp : ffffffc013febca0 > x29: ffffffc013febca0 x28: ffffff88d9438040 > x27: 0000000000000000 x26: 0000000000000000 > x25: 0000000000000000 x24: ffffffdd0e9772c0 > x23: 0000000000000020 x22: ffffffdd0e975366 > x21: ffffffdd0e9772e0 x20: ffffffc013febde0 > x19: 0000000000000008 x18: 0000000000000000 > x17: 0000000000000000 x16: 0000000000000037 > x15: ffffffdd102ab174 x14: 0000000000000003 > x13: 0000000000000004 x12: 0000000000000000 > x11: 0000000000000000 x10: 0000000000000000 > x9 : 0000000000000001 x8 : ffffffdd0e979000 > x7 : 0000000000000000 x6 : ffffffdd10ff6b54 > x5 : 0000000000000000 x4 : 0000000000000000 > x3 : ffffffc013feb938 x2 : ffffff89fef05a70 > x1 : ffffff89feef5788 x0 : ffffffdd0e9772e0 > Call trace: > lkdtm_WARNING+0x28/0x30 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9] > direct_entry+0x16c/0x1b4 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9] > full_proxy_write+0x74/0xa4 > vfs_write+0xec/0x2e8 > ksys_write+0x84/0xf0 > __arm64_sys_write+0x24/0x30 > el0_svc_common+0xf4/0x1c0 > do_el0_svc_compat+0x28/0x3c > el0_svc_compat+0x10/0x1c > el0_sync_compat_handler+0xa8/0xcc > el0_sync_compat+0x178/0x180 > ---[ end trace f89bc7f5417cbcc6 ]--- I used this test patch: diff --git a/lib/test_printf.c b/lib/test_printf.c index 95a2f82427c7..06716948de3c 100644 --- a/lib/test_printf.c +++ b/lib/test_printf.c @@ -686,6 +686,7 @@ test_pointer(void) kernel_ptr(); struct_resource(); addr(); + BUG(); escaped_str(); hex_string(); mac(); Then I did # modprobe test_printf and got the following: [ 125.707281][ T1837] test_printf: loaded. [ 125.707707][ T1837] ------------[ cut here ]------------ [ 125.708227][ T1837] kernel BUG at lib/test_printf.c:689! [ 125.709102][ T1837] invalid opcode: 0000 [#1] SMP NOPTI [ 125.709111][ T1837] CPU: 3 PID: 1837 Comm: modprobe Kdump: loaded Tainted: G E 5.12.0-rc6-default+ #171 00000080ffffffff000000000000000000000000 [ 125.712102][ T1837] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 [ 125.712191][ T1837] RIP: 0010:test_printf_init+0x561/0xc99 [test_printf c2388ff0552611501b4d2ad58d8e5ca441d9a350] [ 125.712233][ T1837] Code: 00 48 c7 c7 b8 96 0f c0 e8 19 f9 ff ff b9 ab 00 00 00 48 c7 c2 93 96 0f c0 be 08 00 00 00 48 c7 c7 af 96 0f c0 e8 fc f8 ff ff <0f> 0b 8b 05 44 07 00 00 8b 35 3a 07 00 00 8b 1d 3c 07 00 00 85 c0 [ 125.712235][ T1837] RSP: 0018:ffffb060004abc78 EFLAGS: 00010282 [ 125.712240][ T1837] RAX: 0000000000000000 RBX: ffffb060004abc80 RCX: ffffd05fffc00b70 [ 125.712241][ T1837] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffad352fd5 [ 125.712242][ T1837] RBP: ffffffffc00fd367 R08: 0000000000000001 R09: 0000000000000001 [ 125.712243][ T1837] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b6cc49ffc00 [ 125.712244][ T1837] R13: 0000000000000001 R14: ffff9b6cc2ee0000 R15: ffffb060004abe90 [ 125.712247][ T1837] FS: 00007f70c4466b80(0000) GS:ffff9b6d3fc00000(0000) knlGS:0000000000000000 [ 125.712248][ T1837] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 125.712249][ T1837] CR2: 00007ffdb02e8b38 CR3: 000000010b6f2002 CR4: 0000000000370ee0 [ 125.712257][ T1837] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 125.712257][ T1837] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 125.712258][ T1837] Call Trace: [ 125.712275][ T1837] ? __test+0x13c/0x149 [test_printf] [ 125.723404][ T1837] ? rcu_read_lock_sched_held+0x52/0x80 [ 125.723425][ T1837] do_one_initcall+0x5b/0x2d0 [ 125.724367][ T1837] do_init_module+0x5b/0x21c [ 125.724377][ T1837] load_module+0x1eaa/0x23c0 [ 125.725381][ T1837] ? show_modinfo_version+0x30/0x30 [ 125.725422][ T1837] ? __do_sys_finit_module+0xad/0x110 [ 125.725425][ T1837] __do_sys_finit_module+0xad/0x110 [ 125.725431][ T1837] do_syscall_64+0x33/0x40 [ 125.725464][ T1837] entry_SYSCALL_64_after_hwframe+0x44/0xae It shows wrong build id for vmlinux. And it does not show the build if for the module at all. readelf shows the following: # readelf -Wn /lib/modules/5.12.0-rc6-default+/kernel/lib/test_printf.ko Displaying notes found in: .note.gnu.build-id Owner Data size Description GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: c2388ff0552611501b4d2ad58d8e5ca441d9a350 Best Regards, Petr