On Mon, 30 Dec 2024 13:55:00 -0500 Genes Lists <lists@xxxxxxxxxxxx> wrote: > On Fri, 2024-12-27 at 11:15 -0500, Genes Lists wrote: > > I have not had a chance to bisect this yet but since its in stable > > thought it best to share sooner than later. > > > > If noone spots anything, I will do a bisect soon as I can. Boot > > completes and aside from this error things seem fine. > > > > - Does not happen with 6.12.6. > > - Does happen with Mainline + patch for different drm bug > > mainline commit : d6ef8b40d075c425f548002d2f35ae3f06e9cf96 > > Patch: https://patchwork.freedesktop.org/series/141911/ > > Patch fixes drm bug reported here (patch not yet in mainline): > > > > https://lore.kernel.org/lkml/3b097dddd7095bccabe6791b90899c689f271a35 > > .camel@xxxxxxxxxxxx/ > > > > Graphics h/w: > > 00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-P > > [Iris Xe Graphics] (rev 04) > > > > Trace below and full dmesg attached. > > > > gene > > > > [ +0.143567] ------------[ cut here ]------------ > > [ +0.000003] WARNING: CPU: 6 PID: 344 at > > kernel/trace/trace_events.c:577 trace_event_raw_init+0x159/0x660 > > [ +0.000005] Modules linked in: xe(+) drm_ttm_helper gpu_sched > > drm_suballoc_helper drm_gpuvm drm_exec typec_displayport usbhid > > hid_sensor_custom hid_sensor_hub intel_ishtp_hid nvme > > spi_pxa2xx_platform hid_multitouch dw_dmac nvme_core 8250_dw > > spi_pxa2xx_core i915 crct10dif_pclmul crc32_pclmul i2c_algo_bit > > crc32c_intel psmouse polyval_clmulni serio_raw polyval_generic > > drm_buddy atkbd ghash_clmulni_intel libps2 vivaldi_fmap ttm > > sha512_ssse3 sha256_ssse3 intel_gtt sha1_ssse3 aesni_intel gf128mul > > drm_display_helper ucsi_acpi crypto_simd typec_ucsi intel_lpss_pci > > intel_ish_ipc cryptd intel_lpss roles intel_ishtp idma64 cec vmd > > typec > > i2c_hid_acpi i8042 video i2c_hid serio wmi pkcs8_key_parser wireguard > > curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 > > ip6_udp_tunnel udp_tunnel libcurve25519_generic libchacha loop sg > > crypto_user > > [ +0.000022] CPU: 6 UID: 0 PID: 344 Comm: (udev-worker) Not tainted > > 6.12.7-stable-1 #16 8ef5eca39d52f9e2c5a50ace761c4a6858369a18 > > [ +0.000003] Hardware name: Dell Inc. XPS 9320/0CR6NC, BIOS 2.16.1 > > 09/11/2024 > > [ +0.000001] RIP: 0010:trace_event_raw_init+0x159/0x660 > > [ +0.000002] Code: 89 ea 0f 83 3b 04 00 00 e8 84 db ff ff 84 c0 74 > > 10 > > 8b 0c 24 48 c7 c0 fe ff ff ff 48 d3 c0 49 21 c6 4d 85 f6 0f 84 d6 fe > > ff > > ff <0f> 0b bb 01 00 00 00 41 f6 c6 01 0f 85 f3 78 bf 00 66 0f 1f 44 > > 00 > > [ +0.000001] RSP: 0018:ffffa3c8007bb980 EFLAGS: 00010206 > > [ +0.000001] RAX: ffffffffffffffdf RBX: ffffffffc0ef8731 RCX: > > 0000000000000005 > > [ +0.000001] RDX: 0000000000000002 RSI: 0000000000000001 RDI: > > ffffffffc0ef8727 > > [ +0.000001] RBP: ffffffffc0ef8640 R08: 0000000000000039 R09: > > 0000000000000000 > > [ +0.000000] R10: 0000000000000076 R11: 000000000000004e R12: > > 00000000000000f2 > > [ +0.000001] R13: ffffffffc0ef9760 R14: 0000000000000018 R15: > > 0000000000000000 > > [ +0.000001] FS: 000075f6c10d0880(0000) GS:ffff96b92f500000(0000) > > knlGS:0000000000000000 > > [ +0.000001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ +0.000000] CR2: 00007be641f52f70 CR3: 000000010cffe000 CR4: > > 0000000000f50ef0 > > [ +0.000001] PKRU: 55555554 > > [ +0.000001] Call Trace: > > [ +0.000001] <TASK> > > [ +0.000000] ? trace_event_raw_init+0x159/0x660 > > [ +0.000002] ? __warn.cold+0x93/0xf6 > > [ +0.000001] ? trace_event_raw_init+0x159/0x660 > > [ +0.000003] ? report_bug+0xff/0x140 > > [ +0.000002] ? handle_bug+0x58/0x90 > > [ +0.000001] ? exc_invalid_op+0x17/0x70 > > [ +0.000002] ? asm_exc_invalid_op+0x1a/0x20 > > [ +0.000003] ? trace_event_raw_init+0x159/0x660 > > [ +0.000002] event_init+0x28/0x70 > > [ +0.000002] trace_module_notify+0x19a/0x250 > > [ +0.000001] notifier_call_chain+0x5a/0xd0 > > [ +0.000002] blocking_notifier_call_chain_robust+0x65/0xc0 > > [ +0.000002] load_module+0x1818/0x1ce0 > > [ +0.000004] ? init_module_from_file+0x89/0xe0 > > [ +0.000001] init_module_from_file+0x89/0xe0 > > [ +0.000002] idempotent_init_module+0x11e/0x310 > > [ +0.000002] __x64_sys_finit_module+0x5e/0xb0 > > [ +0.000002] do_syscall_64+0x82/0x160 > > [ +0.000002] ? syscall_exit_to_user_mode+0x10/0x200 > > [ +0.000001] ? do_syscall_64+0x8e/0x160 > > [ +0.000001] ? syscall_exit_to_user_mode+0x10/0x200 > > [ +0.000001] ? do_syscall_64+0x8e/0x160 > > [ +0.000001] ? do_sys_openat2+0x9c/0xe0 > > [ +0.000002] ? syscall_exit_to_user_mode+0x10/0x200 > > [ +0.000001] ? do_syscall_64+0x8e/0x160 > > [ +0.000001] ? do_sys_openat2+0x9c/0xe0 > > [ +0.000002] ? syscall_exit_to_user_mode+0x10/0x200 > > [ +0.000001] ? do_syscall_64+0x8e/0x160 > > [ +0.000001] ? exc_page_fault+0x7e/0x180 > > [ +0.000001] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > [ +0.000002] RIP: 0033:0x75f6c18c71fd > > [ +0.000011] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e > > fa > > 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 > > 0f > > 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 fa 0c 00 f7 d8 64 89 01 > > 48 > > [ +0.000001] RSP: 002b:00007ffdec463948 EFLAGS: 00000246 ORIG_RAX: > > 0000000000000139 > > [ +0.000001] RAX: ffffffffffffffda RBX: 000057194ec1c5c0 RCX: > > 000075f6c18c71fd > > [ +0.000001] RDX: 0000000000000004 RSI: 000075f6c10ca05d RDI: > > 000000000000002b > > [ +0.000000] RBP: 00007ffdec463a00 R08: 0000000000000001 R09: > > 00007ffdec463990 > > [ +0.000001] R10: 0000000000000040 R11: 0000000000000246 R12: > > 000075f6c10ca05d > > [ +0.000000] R13: 0000000000020000 R14: 000057194ec1dc40 R15: > > 000057194ec1e900 > > [ +0.000001] </TASK> > > [ +0.000001] ---[ end trace 0000000000000000 ]--- > > [ +0.000000] event xe_bo_move has unsafe dereference of argument 4 > > [ +0.000001] print_fmt: "move_lacks_source:%s, migrate object %p > > [size > > %zu] from %s to %s device_id:%s", REC->move_lacks_source ? "yes" : > > "no", REC->bo, REC->size, xe_mem_type_to_name[REC->old_placement], > > xe_mem_type_to_name[REC->new_placement], __get_str(device_id) > > Yep. That's a false positive :-( We stopped doing runtime checks of '%s' as it required using va_list "hacks", so the checks have been moved to the boot up code. The above has: TRACE_EVENT(xe_bo_move, TP_PROTO(struct xe_bo *bo, uint32_t new_placement, uint32_t old_placement, bool move_lacks_source), TP_ARGS(bo, new_placement, old_placement, move_lacks_source), TP_STRUCT__entry( __field(struct xe_bo *, bo) __field(size_t, size) __field(u32, new_placement) __field(u32, old_placement) __string(device_id, __dev_name_bo(bo)) __field(bool, move_lacks_source) ), TP_fast_assign( __entry->bo = bo; __entry->size = bo->size; __entry->new_placement = new_placement; __entry->old_placement = old_placement; __assign_str(device_id); __entry->move_lacks_source = move_lacks_source; ), TP_printk("move_lacks_source:%s, migrate object %p [size %zu] from %s to %s device_id:%s", __entry->move_lacks_source ? "yes" : "no", __entry->bo, __entry->size, xe_mem_type_to_name[__entry->old_placement], xe_mem_type_to_name[__entry->new_placement], __get_str(device_id)) Where the xe_mem_type_to_name is an array of strings, that I assume will always be there for the file of the module. I guess the "fix" would be to have the check code ignore pointer to arrays, assuming they are "ok". The check is mostly to catch the naive use of: TP_fast_assign( __entry->str = passed_in_string; ) TP_printk("str=%s", __entry->str) Which was 99% of the bugs that this would catch. I would think if we have: TP_fast_assign( __entry->idx = passed_in_idx; ) TP_printk("str=%s", some_array[__entry->idx]) Would 99% be OK. For the 1% it's not, oh well, that would be a false negative. Still better than not having this check. -- Steve > > 1) rc5 has the same issue. > > 2) Result of git bisect starting with 6.12.6 is : > > gene > > > git bisect good > 683eccacc02d2eb25d1c34b8fb0363fcc7e08f64 is the first bad commit > commit 683eccacc02d2eb25d1c34b8fb0363fcc7e08f64 > Author: Steven Rostedt <rostedt@xxxxxxxxxxx> > Date: Mon Dec 16 21:41:21 2024 -0500 > > tracing: Add "%s" check in test_event_printk() > > commit 65a25d9f7ac02e0cf361356e834d1c71d36acca9 upstream. > > The test_event_printk() code makes sure that when a trace event is > registered, any dereferenced pointers in from the event's > TP_printk() are > pointing to content in the ring buffer. But currently it does not > handle > "%s", as there's cases where the string pointer saved in the ring > buffer > points to a static string in the kernel that will never be freed. > As that > is a valid case, the pointer needs to be checked at runtime. > > Currently the runtime check is done via trace_check_vprintf(), but > to not > have to replicate everything in vsnprintf() it does some logic with > the > va_list that may not be reliable across architectures. In order to > get rid > of that logic, more work in the test_event_printk() needs to be > done. Some > of the strings can be validated at this time when it is obvious the > string > is valid because the string will be saved in the ring buffer > content. > > Do all the validation of strings in the ring buffer at boot in > test_event_printk(), and make sure that the field of the strings > that > point into the kernel are accessible. This will allow adding checks > at > runtime that will validate the fields themselves and not rely on > paring > the TP_printk() format at runtime. > > Cc: stable@xxxxxxxxxxxxxxx > Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx> > Cc: Mark Rutland <mark.rutland@xxxxxxx> > Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Link: https://lore.kernel.org/20241217024720.685917008@xxxxxxxxxxx > Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts > for dereferencing pointers") > Signed-off-by: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx> > Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > > kernel/trace/trace_events.c | 104 > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---- > -------- > 1 file changed, 89 insertions(+), 15 deletions(-) > > 3) The git bisect log is: > > git bisect start > # status: waiting for both good and bad commits > # good: [e9d65b48ce1aba50e9ec7eab6d9f73d1ba88420e] Linux 6.12.6 > git bisect good e9d65b48ce1aba50e9ec7eab6d9f73d1ba88420e > # bad: [319addc2ad901dac4d6cc931d77ef35073e0942f] Linux 6.12.7 > git bisect bad 319addc2ad901dac4d6cc931d77ef35073e0942f > # good: [ced29f2149c9b65ca01008ba64d9db993ad8fad1] USB: serial: > option: add Telit FE910C04 rmnet compositions > git bisect good ced29f2149c9b65ca01008ba64d9db993ad8fad1 > # good: [b2f23a1c7181f8c562ea06d3ea14879334d6bdf4] drm/amdgpu/gfx12: > fix IP version check > git bisect good b2f23a1c7181f8c562ea06d3ea14879334d6bdf4 > # bad: [2976661ed4de10bd1945e7cca0d7363b44f240dd] of/irq: Fix > interrupt-map cell length check in of_irq_parse_imap_parent() > git bisect bad 2976661ed4de10bd1945e7cca0d7363b44f240dd > # good: [3985cd30472a6a67bfe3f0b42ad23b59fa824033] tracing: Fix > test_event_printk() to process entire print argument > git bisect good 3985cd30472a6a67bfe3f0b42ad23b59fa824033 > # bad: [416226eb3f3a3008456903fc3695b0efa8ceafa1] selftests/bpf: Use > asm constraint "m" for LoongArch > git bisect bad 416226eb3f3a3008456903fc3695b0efa8ceafa1 > # bad: [683eccacc02d2eb25d1c34b8fb0363fcc7e08f64] tracing: Add "%s" > check in test_event_printk() > git bisect bad 683eccacc02d2eb25d1c34b8fb0363fcc7e08f64 > # good: [71a613ceac1ad6235e164aea34abcddb258296af] tracing: Add > missing helper functions in event pointer dereference check > git bisect good 71a613ceac1ad6235e164aea34abcddb258296af > # first bad commit: [683eccacc02d2eb25d1c34b8fb0363fcc7e08f64] > tracing: Add "%s" check in test_event_printk() > >