On Tue, Jul 4, 2023 at 4:59 PM Holger Hoffstätte <holger@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > On 2023-07-05 00:42, Matthew Wilcox wrote: > > On Tue, Jul 04, 2023 at 11:34:27PM +0200, Holger Hoffstätte wrote: > >> I applied the fix and did a clean rebuild. The first attempt to boot resulted in > >> the following oops, though it kind of continued: > > > > It would be helpful to run this through decode_stacktrace.sh > > > >> Jul 4 22:35:22 hho kernel: BUG: kernel NULL pointer dereference, address: 0000000000000052 > >> Jul 4 22:35:22 hho kernel: #PF: supervisor read access in kernel mode > >> Jul 4 22:35:22 hho kernel: #PF: error_code(0x0000) - not-present page > >> Jul 4 22:35:22 hho kernel: PGD 0 P4D 0 > >> Jul 4 22:35:22 hho kernel: Oops: 0000 [#1] SMP > >> Jul 4 22:35:22 hho kernel: CPU: 10 PID: 1740 Comm: start-stop-daem Not tainted 6.4.1 #1 > >> Jul 4 22:35:22 hho kernel: Hardware name: LENOVO 20U50001GE/20U50001GE, BIOS R19ET32W (1.16 ) 01/26/2021 > >> Jul 4 22:35:22 hho kernel: RIP: 0010:wq_worker_comm+0x63/0xc0 > >> Jul 4 22:35:22 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 7e 6b 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 29 b6 8b 00 80 7b > > > > Faulting insn: > > > > 0: 4c 8b 70 48 mov 0x48(%rax),%r14 > > > > and rax is 0xa, which matches up with 0x52 as the faulting address. > > > > I'm not sure this is related to the VMA patches. It might be something > > unrelated that doesn't often come up? > > See below for the reveal! > > >> Jul 4 22:35:22 hho kernel: RSP: 0018:ffffc90000fb7bb8 EFLAGS: 00010202 > >> Jul 4 22:35:22 hho kernel: RAX: 000000000000000a RBX: ffff88810cd43300 RCX: 0001020304050608 > >> Jul 4 22:35:22 hho kernel: RDX: ffff88811395bfc0 RSI: 7fffffffffffffff RDI: ffff88810cd43300 > >> Jul 4 22:35:22 hho kernel: RBP: 000000000000000f R08: ffffc90000fb7be8 R09: 0000000000000040 > >> Jul 4 22:35:22 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90000fb7be8 > >> Jul 4 22:35:22 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001 > >> Jul 4 22:35:22 hho kernel: FS: 00007f39dde1c740(0000) GS:ffff8887ef680000(0000) knlGS:0000000000000000 > >> Jul 4 22:35:22 hho kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> Jul 4 22:35:22 hho kernel: CR2: 0000000000000052 CR3: 0000000112188000 CR4: 0000000000350ee0 > >> Jul 4 22:35:22 hho kernel: Call Trace: > >> Jul 4 22:35:22 hho kernel: <TASK> > >> Jul 4 22:35:22 hho kernel: ? __die+0x1f/0x60 > >> Jul 4 22:35:22 hho kernel: ? page_fault_oops+0x14d/0x410 > >> Jul 4 22:35:22 hho kernel: ? xa_load+0x82/0xa0 > >> Jul 4 22:35:22 hho kernel: ? exc_page_fault+0x60/0x100 > >> Jul 4 22:35:22 hho kernel: ? asm_exc_page_fault+0x22/0x30 > >> Jul 4 22:35:22 hho kernel: ? wq_worker_comm+0x63/0xc0 > >> Jul 4 22:35:22 hho last message buffered 1 times > >> Jul 4 22:35:22 hho kernel: proc_task_name+0xa4/0xb0 > >> Jul 4 22:35:22 hho kernel: ? seq_put_decimal_ull_width+0x96/0x100 > >> Jul 4 22:35:22 hho kernel: do_task_stat+0x44b/0xe10 > >> Jul 4 22:35:22 hho kernel: proc_single_show+0x4b/0xa0 > >> Jul 4 22:35:22 hho kernel: seq_read_iter+0xff/0x410 > >> Jul 4 22:35:22 hho kernel: ? generic_fillattr+0x45/0xf0 > >> Jul 4 22:35:22 hho kernel: seq_read+0x93/0xb0 > >> Jul 4 22:35:22 hho kernel: vfs_read+0x9b/0x2c0 > >> Jul 4 22:35:22 hho kernel: ? __do_sys_newfstatat+0x22/0x30 > >> Jul 4 22:35:22 hho kernel: ksys_read+0x53/0xc0 > >> Jul 4 22:35:22 hho kernel: do_syscall_64+0x35/0x80 > >> Jul 4 22:35:22 hho kernel: entry_SYSCALL_64_after_hwframe+0x46/0xb0 > >> Jul 4 22:35:22 hho kernel: RIP: 0033:0x7f39ddf5877d > >> Jul 4 22:35:22 hho kernel: Code: b9 fe ff ff 48 8d 3d 1a 71 0a 00 50 e8 2c 12 02 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 80 3d 81 4c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83 > >> Jul 4 22:35:22 hho kernel: RSP: 002b:00007ffe4b98b6f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > >> Jul 4 22:35:22 hho kernel: RAX: ffffffffffffffda RBX: 00005655194cab40 RCX: 00007f39ddf5877d > >> Jul 4 22:35:22 hho kernel: RDX: 0000000000000400 RSI: 00005655194ccd30 RDI: 0000000000000004 > >> Jul 4 22:35:22 hho kernel: RBP: 00007ffe4b98b760 R08: 00007f39ddff8cb2 R09: 0000000000000001 > >> Jul 4 22:35:22 hho kernel: R10: 0000000000001000 R11: 0000000000000246 R12: 00007f39de0324a0 > >> Jul 4 22:35:22 hho kernel: R13: 00005655194cd140 R14: 0000000000000a68 R15: 00007f39de031ba0 > >> Jul 4 22:35:22 hho kernel: </TASK> > >> Jul 4 22:35:22 hho kernel: Modules linked in: mousedev sch_fq_codel bpf_preload snd_ctl_led amdgpu iwlmvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi mac80211 pkcs8_key_parser drm_ttm_helper ttm iommu_v2 gpu_sched snd_hda_intel libarc4 i2c_algo_bit snd_intel_dspcfg drm_buddy drm_suballoc_helper uvcvideo snd_hda_codec drm_display_helper edac_mce_amd videobuf2_vmalloc snd_hwdep crct10dif_pclmul videobuf2_memops uvc crc32_pclmul cec snd_hda_core crc32c_intel videobuf2_v4l2 ghash_clmulni_intel lm92 r8169 sha512_ssse3 snd_pcm videodev psmouse thinkpad_acpi iwlwifi drivetemp ledtrig_audio drm_kms_helper rapl videobuf2_common realtek snd_timer serio_raw snd_rn_pci_acp3x wmi_bmof platform_profile cfg80211 mc snd_acp_config k10temp snd syscopyarea mdio_devres ucsi_acpi snd_soc_acpi sysfillrect drm snd_pci_acp3x i2c_piix4 sysimgblt soundcore typec_ucsi ipmi_devintf rfkill roles libphy ipmi_msghandler typec video battery ac wmi i2c_scmi button > >> Jul 4 22:35:22 hho kernel: CR2: 0000000000000052 > >> Jul 4 22:35:22 hho kernel: ---[ end trace 0000000000000000 ]--- > >> Jul 4 22:35:22 hho kernel: RIP: 0010:wq_worker_comm+0x63/0xc0 > >> Jul 4 22:35:22 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 7e 6b 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 29 b6 8b 00 80 7b > >> Jul 4 22:35:22 hho kernel: RSP: 0018:ffffc90000fb7bb8 EFLAGS: 00010202 > >> Jul 4 22:35:22 hho kernel: RAX: 000000000000000a RBX: ffff88810cd43300 RCX: 0001020304050608 > >> Jul 4 22:35:22 hho kernel: RDX: ffff88811395bfc0 RSI: 7fffffffffffffff RDI: ffff88810cd43300 > >> Jul 4 22:35:22 hho kernel: RBP: 000000000000000f R08: ffffc90000fb7be8 R09: 0000000000000040 > >> Jul 4 22:35:22 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90000fb7be8 > >> Jul 4 22:35:22 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001 > >> Jul 4 22:35:22 hho kernel: FS: 00007f39dde1c740(0000) GS:ffff8887ef680000(0000) knlGS:0000000000000000 > >> Jul 4 22:35:22 hho kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> Jul 4 22:35:22 hho kernel: CR2: 0000000000000052 CR3: 0000000112188000 CR4: 0000000000350ee0 > >> Jul 4 22:35:22 hho kernel: note: start-stop-daem[1740] exited with irqs disabled > >> Jul 4 22:35:22 hho kernel: Generic FE-GE Realtek PHY r8169-0-200:00: attached PHY driver (mii_bus:phy_addr=r8169-0-200:00, irq=MAC) > >> Jul 4 22:35:22 hho kernel: r8169 0000:02:00.0 eth0: Link is Down > >> Jul 4 22:35:24 hho kernel: r8169 0000:02:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx > >> Jul 4 22:35:24 hho kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > >> > >> It then kind of limped along until I rebooted again. This second attempt to boot > >> died and locked up completely, again during amdgpu initialization, and is on display here: > >> https://imgur.com/a/3ZE66kh > > > > refill_obj_stock() is also somewhat unrelated to VMA stuff. This is > > all very bizarre. > > > >> Finally I just edited mm/Kconfig and set config PER_VMA_LOCK to "defbool n" to override > >> any setting in my old config. That made everything work again - it's what I'm using now. > > > > Could I ask you to try a few boots with PER_VMA_LOCK set to "n", just > > to eliminate the possibility that this is a coincidence? > > > > HOLY SMOKES! You are on to something! I wanted to do 10 reboots and didn't expect > anything to happen since this has been working fine since forever, and I don't boot > that often since suspend is quite reliable these days. It did 9 without problems and > then on the 10th reboot it crapped out, again with the xa_load pagefault. Ok, sounds like the results of the fix are inconclusive. I guess we should wait for more testing before concluding whether the fix is valid. In the meantime, per Andrew's request, I posted the patchset that includes both the fix and the proper kill switch of the feature at https://lore.kernel.org/all/20230705063711.2670599-1-surenb@xxxxxxxxxx/. Thanks, Suren. > > Here's the first trace: > > holger>/tmp/linux-6.4.1/scripts/decode_stacktrace.sh /boot/kernel-genkernel-x86_64-6.4.1 < /tmp/kern.log > Jul 4 22:35:22 hho kernel: [drm] Initialized amdgpu 3.52.0 20150101 for 0000:06:00.0 on minor 0 > Jul 4 22:35:22 hho kernel: fbcon: amdgpudrmfb (fb0) is primary device > Jul 4 22:35:22 hho kernel: [drm] DSC precompute is not needed. > Jul 4 22:35:22 hho kernel: Console: switching to colour frame buffer device 240x67 > Jul 4 22:35:22 hho kernel: amdgpu 0000:06:00.0: [drm] fb0: amdgpudrmfb frame buffer device > Jul 4 22:35:22 hho kernel: BUG: kernel NULL pointer dereference, address: 0000000000000052 > Jul 4 22:35:22 hho kernel: #PF: supervisor read access in kernel mode > Jul 4 22:35:22 hho kernel: #PF: error_code(0x0000) - not-present page > Jul 4 22:35:22 hho kernel: PGD 0 P4D 0 > Jul 4 22:35:22 hho kernel: Oops: 0000 [#1] SMP > Jul 4 22:35:22 hho kernel: CPU: 10 PID: 1740 Comm: start-stop-daem Not tainted 6.4.1 #1 > Jul 4 22:35:22 hho kernel: Hardware name: LENOVO 20U50001GE/20U50001GE, BIOS R19ET32W (1.16 ) 01/26/2021 > Jul 4 22:35:22 hho kernel: RIP: wq_worker_comm+0x63/0xc0 > Jul 4 22:35:22 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 7e 6b 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 29 b6 8b 00 80 7b > All code > ======== > 0: 43 2c 20 rex.XB sub $0x20,%al > 3: 75 1d jne 0x22 > 5: 5b pop %rbx > 6: 5d pop %rbp > 7: 48 c7 c7 e0 a4 43 82 mov $0xffffffff8243a4e0,%rdi > e: 41 5c pop %r12 > 10: 41 5d pop %r13 > 12: 41 5e pop %r14 > 14: e9 7e 6b 8b 00 jmp 0x8b6b97 > 19: 5b pop %rbx > 1a: 5d pop %rbp > 1b: 41 5c pop %r12 > 1d: 41 5d pop %r13 > 1f: 41 5e pop %r14 > 21: c3 ret > 22: 48 89 df mov %rbx,%rdi > 25: e8 ad 35 00 00 call 0x35d7 > 2a:* 4c 8b 70 48 mov 0x48(%rax),%r14 <-- trapping instruction > 2e: 48 89 c3 mov %rax,%rbx > 31: 4d 85 f6 test %r14,%r14 > 34: 74 cf je 0x5 > 36: 4c 89 f7 mov %r14,%rdi > 39: e8 29 b6 8b 00 call 0x8bb667 > 3e: 80 .byte 0x80 > 3f: 7b .byte 0x7b > > Code starting with the faulting instruction > =========================================== > 0: 4c 8b 70 48 mov 0x48(%rax),%r14 > 4: 48 89 c3 mov %rax,%rbx > 7: 4d 85 f6 test %r14,%r14 > a: 74 cf je 0xffffffffffffffdb > c: 4c 89 f7 mov %r14,%rdi > f: e8 29 b6 8b 00 call 0x8bb63d > 14: 80 .byte 0x80 > 15: 7b .byte 0x7b > Jul 4 22:35:22 hho kernel: RSP: 0018:ffffc90000fb7bb8 EFLAGS: 00010202 > Jul 4 22:35:22 hho kernel: RAX: 000000000000000a RBX: ffff88810cd43300 RCX: 0001020304050608 > Jul 4 22:35:22 hho kernel: RDX: ffff88811395bfc0 RSI: 7fffffffffffffff RDI: ffff88810cd43300 > Jul 4 22:35:22 hho kernel: RBP: 000000000000000f R08: ffffc90000fb7be8 R09: 0000000000000040 > Jul 4 22:35:22 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90000fb7be8 > Jul 4 22:35:22 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001 > Jul 4 22:35:22 hho kernel: FS: 00007f39dde1c740(0000) GS:ffff8887ef680000(0000) knlGS:0000000000000000 > Jul 4 22:35:22 hho kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Jul 4 22:35:22 hho kernel: CR2: 0000000000000052 CR3: 0000000112188000 CR4: 0000000000350ee0 > Jul 4 22:35:22 hho kernel: Call Trace: > Jul 4 22:35:22 hho kernel: <TASK> > Jul 4 22:35:22 hho kernel: ? __die+0x1f/0x60 > Jul 4 22:35:22 hho kernel: ? page_fault_oops+0x14d/0x410 > Jul 4 22:35:22 hho kernel: ? xa_load+0x82/0xa0 > Jul 4 22:35:22 hho kernel: ? exc_page_fault+0x60/0x100 > Jul 4 22:35:22 hho kernel: ? asm_exc_page_fault+0x22/0x30 > Jul 4 22:35:22 hho kernel: ? wq_worker_comm+0x63/0xc0 > Jul 4 22:35:22 hho last message buffered 1 times > Jul 4 22:35:22 hho kernel: proc_task_name+0xa4/0xb0 > Jul 4 22:35:22 hho kernel: ? seq_put_decimal_ull_width+0x96/0x100 > Jul 4 22:35:22 hho kernel: do_task_stat+0x44b/0xe10 > Jul 4 22:35:22 hho kernel: proc_single_show+0x4b/0xa0 > Jul 4 22:35:22 hho kernel: seq_read_iter+0xff/0x410 > Jul 4 22:35:22 hho kernel: ? generic_fillattr+0x45/0xf0 > Jul 4 22:35:22 hho kernel: seq_read+0x93/0xb0 > Jul 4 22:35:22 hho kernel: vfs_read+0x9b/0x2c0 > Jul 4 22:35:22 hho kernel: ? __do_sys_newfstatat+0x22/0x30 > Jul 4 22:35:22 hho kernel: ksys_read+0x53/0xc0 > Jul 4 22:35:22 hho kernel: do_syscall_64+0x35/0x80 > Jul 4 22:35:22 hho kernel: entry_SYSCALL_64_after_hwframe+0x46/0xb0 > Jul 4 22:35:22 hho kernel: RIP: 0033:0x7f39ddf5877d > Jul 4 22:35:22 hho kernel: Code: b9 fe ff ff 48 8d 3d 1a 71 0a 00 50 e8 2c 12 02 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 80 3d 81 4c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83 > All code > ======== > 0: b9 fe ff ff 48 mov $0x48fffffe,%ecx > 5: 8d 3d 1a 71 0a 00 lea 0xa711a(%rip),%edi # 0xa7125 > b: 50 push %rax > c: e8 2c 12 02 00 call 0x2123d > 11: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) > 18: 00 00 00 > 1b: 66 90 xchg %ax,%ax > 1d: 80 3d 81 4c 0e 00 00 cmpb $0x0,0xe4c81(%rip) # 0xe4ca5 > 24: 74 17 je 0x3d > 26: 31 c0 xor %eax,%eax > 28: 0f 05 syscall > 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction > 30: 77 5b ja 0x8d > 32: c3 ret > 33: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) > 3a: 00 00 00 > 3d: 53 push %rbx > 3e: 48 rex.W > 3f: 83 .byte 0x83 > > Code starting with the faulting instruction > =========================================== > 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax > 6: 77 5b ja 0x63 > 8: c3 ret > 9: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) > 10: 00 00 00 > 13: 53 push %rbx > 14: 48 rex.W > 15: 83 .byte 0x83 > Jul 4 22:35:22 hho kernel: RSP: 002b:00007ffe4b98b6f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > Jul 4 22:35:22 hho kernel: RAX: ffffffffffffffda RBX: 00005655194cab40 RCX: 00007f39ddf5877d > Jul 4 22:35:22 hho kernel: RDX: 0000000000000400 RSI: 00005655194ccd30 RDI: 0000000000000004 > Jul 4 22:35:22 hho kernel: RBP: 00007ffe4b98b760 R08: 00007f39ddff8cb2 R09: 0000000000000001 > Jul 4 22:35:22 hho kernel: R10: 0000000000001000 R11: 0000000000000246 R12: 00007f39de0324a0 > Jul 4 22:35:22 hho kernel: R13: 00005655194cd140 R14: 0000000000000a68 R15: 00007f39de031ba0 > Jul 4 22:35:22 hho kernel: </TASK> > Jul 4 22:35:22 hho kernel: Modules linked in: mousedev sch_fq_codel bpf_preload snd_ctl_led amdgpu iwlmvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi mac80211 pkcs8_key_parser drm_ttm_helper ttm iommu_v2 gpu_sched snd_hda_intel libarc4 i2c_algo_bit snd_intel_dspcfg drm_buddy drm_suballoc_helper uvcvideo snd_hda_codec drm_display_helper edac_mce_amd videobuf2_vmalloc snd_hwdep crct10dif_pclmul videobuf2_memops uvc crc32_pclmul cec snd_hda_core crc32c_intel videobuf2_v4l2 ghash_clmulni_intel lm92 r8169 sha512_ssse3 snd_pcm videodev psmouse thinkpad_acpi iwlwifi drivetemp ledtrig_audio drm_kms_helper rapl videobuf2_common realtek snd_timer serio_raw snd_rn_pci_acp3x wmi_bmof platform_profile cfg80211 mc snd_acp_config k10temp snd syscopyarea mdio_devres ucsi_acpi snd_soc_acpi sysfillrect drm snd_pci_acp3x i2c_piix4 sysimgblt soundcore typec_ucsi ipmi_devintf rfkill roles libphy ipmi_msghandler typec video battery ac wmi i2c_scmi button > Jul 4 22:35:22 hho kernel: CR2: 0000000000000052 > Jul 4 22:35:22 hho kernel: ---[ end trace 0000000000000000 ]--- > Jul 4 22:35:22 hho kernel: RIP: wq_worker_comm+0x63/0xc0 > Jul 4 22:35:22 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 7e 6b 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 29 b6 8b 00 80 7b > All code > ======== > 0: 43 2c 20 rex.XB sub $0x20,%al > 3: 75 1d jne 0x22 > 5: 5b pop %rbx > 6: 5d pop %rbp > 7: 48 c7 c7 e0 a4 43 82 mov $0xffffffff8243a4e0,%rdi > e: 41 5c pop %r12 > 10: 41 5d pop %r13 > 12: 41 5e pop %r14 > 14: e9 7e 6b 8b 00 jmp 0x8b6b97 > 19: 5b pop %rbx > 1a: 5d pop %rbp > 1b: 41 5c pop %r12 > 1d: 41 5d pop %r13 > 1f: 41 5e pop %r14 > 21: c3 ret > 22: 48 89 df mov %rbx,%rdi > 25: e8 ad 35 00 00 call 0x35d7 > 2a:* 4c 8b 70 48 mov 0x48(%rax),%r14 <-- trapping instruction > 2e: 48 89 c3 mov %rax,%rbx > 31: 4d 85 f6 test %r14,%r14 > 34: 74 cf je 0x5 > 36: 4c 89 f7 mov %r14,%rdi > 39: e8 29 b6 8b 00 call 0x8bb667 > 3e: 80 .byte 0x80 > 3f: 7b .byte 0x7b > > Code starting with the faulting instruction > =========================================== > 0: 4c 8b 70 48 mov 0x48(%rax),%r14 > 4: 48 89 c3 mov %rax,%rbx > 7: 4d 85 f6 test %r14,%r14 > a: 74 cf je 0xffffffffffffffdb > c: 4c 89 f7 mov %r14,%rdi > f: e8 29 b6 8b 00 call 0x8bb63d > 14: 80 .byte 0x80 > 15: 7b .byte 0x7b > Jul 4 22:35:22 hho kernel: RSP: 0018:ffffc90000fb7bb8 EFLAGS: 00010202 > Jul 4 22:35:22 hho kernel: RAX: 000000000000000a RBX: ffff88810cd43300 RCX: 0001020304050608 > Jul 4 22:35:22 hho kernel: RDX: ffff88811395bfc0 RSI: 7fffffffffffffff RDI: ffff88810cd43300 > Jul 4 22:35:22 hho kernel: RBP: 000000000000000f R08: ffffc90000fb7be8 R09: 0000000000000040 > Jul 4 22:35:22 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90000fb7be8 > Jul 4 22:35:22 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001 > Jul 4 22:35:22 hho kernel: FS: 00007f39dde1c740(0000) GS:ffff8887ef680000(0000) knlGS:0000000000000000 > Jul 4 22:35:22 hho kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Jul 4 22:35:22 hho kernel: CR2: 0000000000000052 CR3: 0000000112188000 CR4: 0000000000350ee0 > Jul 4 22:35:22 hho kernel: note: start-stop-daem[1740] exited with irqs disabled > Jul 4 22:35:22 hho kernel: Generic FE-GE Realtek PHY r8169-0-200:00: attached PHY driver (mii_bus:phy_addr=r8169-0-200:00, irq=MAC) > Jul 4 22:35:22 hho kernel: r8169 0000:02:00.0 eth0: Link is Down > Jul 4 22:35:24 hho kernel: r8169 0000:02:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx > Jul 4 22:35:24 hho kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > > Here is the second one from the reboot bonanza: > > holger>/tmp/linux-6.4.1/scripts/decode_stacktrace.sh /boot/kernel-genkernel-x86_64-6.4.1 < /tmp/kern.log > Jul 5 01:34:20 hho kernel: [drm] Initialized amdgpu 3.52.0 20150101 for 0000:06:00.0 on minor 0 > Jul 5 01:34:20 hho kernel: fbcon: amdgpudrmfb (fb0) is primary device > Jul 5 01:34:20 hho kernel: [drm] DSC precompute is not needed. > Jul 5 01:34:20 hho kernel: Console: switching to colour frame buffer device 240x67 > Jul 5 01:34:20 hho kernel: amdgpu 0000:06:00.0: [drm] fb0: amdgpudrmfb frame buffer device > Jul 5 01:34:20 hho kernel: BUG: kernel NULL pointer dereference, address: 0000000000000052 > Jul 5 01:34:20 hho kernel: #PF: supervisor read access in kernel mode > Jul 5 01:34:20 hho kernel: #PF: error_code(0x0000) - not-present page > Jul 5 01:34:20 hho kernel: PGD 0 P4D 0 > Jul 5 01:34:20 hho kernel: Oops: 0000 [#1] SMP > Jul 5 01:34:20 hho kernel: CPU: 8 PID: 1716 Comm: start-stop-daem Not tainted 6.4.1 #1 > Jul 5 01:34:20 hho kernel: Hardware name: LENOVO 20U50001GE/20U50001GE, BIOS R19ET32W (1.16 ) 01/26/2021 > Jul 5 01:34:20 hho kernel: RIP: wq_worker_comm+0x63/0xc0 > Jul 5 01:34:20 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 2e 59 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 d9 a3 8b 00 80 7b > All code > ======== > 0: 43 2c 20 rex.XB sub $0x20,%al > 3: 75 1d jne 0x22 > 5: 5b pop %rbx > 6: 5d pop %rbp > 7: 48 c7 c7 e0 a4 43 82 mov $0xffffffff8243a4e0,%rdi > e: 41 5c pop %r12 > 10: 41 5d pop %r13 > 12: 41 5e pop %r14 > 14: e9 2e 59 8b 00 jmp 0x8b5947 > 19: 5b pop %rbx > 1a: 5d pop %rbp > 1b: 41 5c pop %r12 > 1d: 41 5d pop %r13 > 1f: 41 5e pop %r14 > 21: c3 ret > 22: 48 89 df mov %rbx,%rdi > 25: e8 ad 35 00 00 call 0x35d7 > 2a:* 4c 8b 70 48 mov 0x48(%rax),%r14 <-- trapping instruction > 2e: 48 89 c3 mov %rax,%rbx > 31: 4d 85 f6 test %r14,%r14 > 34: 74 cf je 0x5 > 36: 4c 89 f7 mov %r14,%rdi > 39: e8 d9 a3 8b 00 call 0x8ba417 > 3e: 80 .byte 0x80 > 3f: 7b .byte 0x7b > > Code starting with the faulting instruction > =========================================== > 0: 4c 8b 70 48 mov 0x48(%rax),%r14 > 4: 48 89 c3 mov %rax,%rbx > 7: 4d 85 f6 test %r14,%r14 > a: 74 cf je 0xffffffffffffffdb > c: 4c 89 f7 mov %r14,%rdi > f: e8 d9 a3 8b 00 call 0x8ba3ed > 14: 80 .byte 0x80 > 15: 7b .byte 0x7b > Jul 5 01:34:20 hho kernel: RSP: 0018:ffffc90001027bb8 EFLAGS: 00010202 > Jul 5 01:34:20 hho kernel: RAX: 000000000000000a RBX: ffff888111052640 RCX: 0001020304050608 > Jul 5 01:34:20 hho kernel: RDX: ffff88810490b300 RSI: 7fffffffffffffff RDI: ffff888111052640 > Jul 5 01:34:20 hho kernel: RBP: 000000000000000f R08: ffffc90001027be8 R09: 0000000000000040 > Jul 5 01:34:20 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90001027be8 > Jul 5 01:34:20 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001 > Jul 5 01:34:20 hho kernel: FS: 00007f917809a740(0000) GS:ffff8887ef600000(0000) knlGS:0000000000000000 > Jul 5 01:34:20 hho kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Jul 5 01:34:20 hho kernel: CR2: 0000000000000052 CR3: 0000000107562000 CR4: 0000000000350ee0 > Jul 5 01:34:20 hho kernel: Call Trace: > Jul 5 01:34:20 hho kernel: <TASK> > Jul 5 01:34:20 hho kernel: ? __die+0x1f/0x60 > Jul 5 01:34:20 hho kernel: ? page_fault_oops+0x14d/0x410 > Jul 5 01:34:20 hho kernel: ? xa_load+0x82/0xa0 > Jul 5 01:34:20 hho last message buffered 1 times > Jul 5 01:34:20 hho kernel: ? exc_page_fault+0x60/0x100 > Jul 5 01:34:20 hho kernel: ? asm_exc_page_fault+0x22/0x30 > Jul 5 01:34:20 hho kernel: ? wq_worker_comm+0x63/0xc0 > Jul 5 01:34:20 hho last message buffered 1 times > Jul 5 01:34:20 hho kernel: proc_task_name+0xa4/0xb0 > Jul 5 01:34:20 hho kernel: ? seq_put_decimal_ull_width+0x96/0x100 > Jul 5 01:34:20 hho kernel: do_task_stat+0x44b/0xe10 > Jul 5 01:34:20 hho kernel: proc_single_show+0x4b/0xa0 > Jul 5 01:34:20 hho kernel: seq_read_iter+0xff/0x410 > Jul 5 01:34:20 hho kernel: ? generic_fillattr+0x45/0xf0 > Jul 5 01:34:20 hho kernel: seq_read+0x93/0xb0 > Jul 5 01:34:20 hho kernel: vfs_read+0x9b/0x2c0 > Jul 5 01:34:20 hho kernel: ? __do_sys_newfstatat+0x22/0x30 > Jul 5 01:34:20 hho kernel: ksys_read+0x53/0xc0 > Jul 5 01:34:20 hho kernel: do_syscall_64+0x35/0x80 > Jul 5 01:34:20 hho kernel: entry_SYSCALL_64_after_hwframe+0x46/0xb0 > Jul 5 01:34:20 hho kernel: RIP: 0033:0x7f91781d677d > Jul 5 01:34:20 hho kernel: Code: b9 fe ff ff 48 8d 3d 1a 71 0a 00 50 e8 2c 12 02 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 80 3d 81 4c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83 > All code > ======== > 0: b9 fe ff ff 48 mov $0x48fffffe,%ecx > 5: 8d 3d 1a 71 0a 00 lea 0xa711a(%rip),%edi # 0xa7125 > b: 50 push %rax > c: e8 2c 12 02 00 call 0x2123d > 11: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) > 18: 00 00 00 > 1b: 66 90 xchg %ax,%ax > 1d: 80 3d 81 4c 0e 00 00 cmpb $0x0,0xe4c81(%rip) # 0xe4ca5 > 24: 74 17 je 0x3d > 26: 31 c0 xor %eax,%eax > 28: 0f 05 syscall > 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction > 30: 77 5b ja 0x8d > 32: c3 ret > 33: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) > 3a: 00 00 00 > 3d: 53 push %rbx > 3e: 48 rex.W > 3f: 83 .byte 0x83 > > Code starting with the faulting instruction > =========================================== > 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax > 6: 77 5b ja 0x63 > 8: c3 ret > 9: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) > 10: 00 00 00 > 13: 53 push %rbx > 14: 48 rex.W > 15: 83 .byte 0x83 > Jul 5 01:34:20 hho kernel: RSP: 002b:00007ffe56a8adb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > Jul 5 01:34:20 hho kernel: RAX: ffffffffffffffda RBX: 0000559458207b40 RCX: 00007f91781d677d > Jul 5 01:34:20 hho kernel: RDX: 0000000000000400 RSI: 0000559458209d30 RDI: 0000000000000004 > Jul 5 01:34:20 hho kernel: RBP: 00007ffe56a8ae20 R08: 00007f9178276cb2 R09: 0000000000000001 > Jul 5 01:34:20 hho kernel: R10: 0000000000001000 R11: 0000000000000246 R12: 00007f91782b04a0 > Jul 5 01:34:20 hho kernel: R13: 000055945820a140 R14: 0000000000000a68 R15: 00007f91782afba0 > Jul 5 01:34:20 hho kernel: </TASK> > Jul 5 01:34:20 hho kernel: Modules linked in: sch_fq_codel bpf_preload mousedev snd_ctl_led iwlmvm snd_hda_codec_realtek amdgpu pkcs8_key_parser snd_hda_codec_generic mac80211 libarc4 drm_ttm_helper snd_hda_codec_hdmi ttm iommu_v2 uvcvideo gpu_sched videobuf2_vmalloc i2c_algo_bit videobuf2_memops snd_hda_intel drm_buddy uvc edac_mce_amd snd_intel_dspcfg crct10dif_pclmul videobuf2_v4l2 drm_suballoc_helper crc32_pclmul lm92 snd_hda_codec drm_display_helper crc32c_intel videodev snd_hwdep ghash_clmulni_intel r8169 drivetemp cec sha512_ssse3 thinkpad_acpi snd_hda_core videobuf2_common psmouse realtek iwlwifi drm_kms_helper rapl ledtrig_audio snd_pcm mc serio_raw snd_rn_pci_acp3x platform_profile syscopyarea wmi_bmof mdio_devres k10temp ipmi_devintf snd_timer snd_acp_config sysfillrect cfg80211 drm ucsi_acpi sysimgblt snd snd_soc_acpi libphy i2c_piix4 ipmi_msghandler snd_pci_acp3x typec_ucsi soundcore rfkill video roles typec battery ac wmi i2c_scmi button > Jul 5 01:34:20 hho kernel: CR2: 0000000000000052 > Jul 5 01:34:20 hho kernel: ---[ end trace 0000000000000000 ]--- > Jul 5 01:34:20 hho kernel: RIP: wq_worker_comm+0x63/0xc0 > Jul 5 01:34:20 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 2e 59 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 d9 a3 8b 00 80 7b > All code > ======== > 0: 43 2c 20 rex.XB sub $0x20,%al > 3: 75 1d jne 0x22 > 5: 5b pop %rbx > 6: 5d pop %rbp > 7: 48 c7 c7 e0 a4 43 82 mov $0xffffffff8243a4e0,%rdi > e: 41 5c pop %r12 > 10: 41 5d pop %r13 > 12: 41 5e pop %r14 > 14: e9 2e 59 8b 00 jmp 0x8b5947 > 19: 5b pop %rbx > 1a: 5d pop %rbp > 1b: 41 5c pop %r12 > 1d: 41 5d pop %r13 > 1f: 41 5e pop %r14 > 21: c3 ret > 22: 48 89 df mov %rbx,%rdi > 25: e8 ad 35 00 00 call 0x35d7 > 2a:* 4c 8b 70 48 mov 0x48(%rax),%r14 <-- trapping instruction > 2e: 48 89 c3 mov %rax,%rbx > 31: 4d 85 f6 test %r14,%r14 > 34: 74 cf je 0x5 > 36: 4c 89 f7 mov %r14,%rdi > 39: e8 d9 a3 8b 00 call 0x8ba417 > 3e: 80 .byte 0x80 > 3f: 7b .byte 0x7b > > Code starting with the faulting instruction > =========================================== > 0: 4c 8b 70 48 mov 0x48(%rax),%r14 > 4: 48 89 c3 mov %rax,%rbx > 7: 4d 85 f6 test %r14,%r14 > a: 74 cf je 0xffffffffffffffdb > c: 4c 89 f7 mov %r14,%rdi > f: e8 d9 a3 8b 00 call 0x8ba3ed > 14: 80 .byte 0x80 > 15: 7b .byte 0x7b > Jul 5 01:34:20 hho kernel: RSP: 0018:ffffc90001027bb8 EFLAGS: 00010202 > Jul 5 01:34:20 hho kernel: RAX: 000000000000000a RBX: ffff888111052640 RCX: 0001020304050608 > Jul 5 01:34:20 hho kernel: RDX: ffff88810490b300 RSI: 7fffffffffffffff RDI: ffff888111052640 > Jul 5 01:34:20 hho kernel: RBP: 000000000000000f R08: ffffc90001027be8 R09: 0000000000000040 > Jul 5 01:34:20 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90001027be8 > Jul 5 01:34:20 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001 > Jul 5 01:34:20 hho kernel: FS: 00007f917809a740(0000) GS:ffff8887ef600000(0000) knlGS:0000000000000000 > Jul 5 01:34:20 hho kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Jul 5 01:34:20 hho kernel: CR2: 0000000000000052 CR3: 0000000107562000 CR4: 0000000000350ee0 > Jul 5 01:34:20 hho kernel: note: start-stop-daem[1716] exited with irqs disabled > Jul 5 01:34:20 hho kernel: Generic FE-GE Realtek PHY r8169-0-200:00: attached PHY driver (mii_bus:phy_addr=r8169-0-200:00, irq=MAC) > Jul 5 01:34:21 hho kernel: r8169 0000:02:00.0 eth0: Link is Down > Jul 5 01:34:23 hho kernel: r8169 0000:02:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx > Jul 5 01:34:23 hho kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > > The crashing process was openrc's start-stop-daemon starting acpid, though I think > both are just the victims here. > > Hope this helps! > > cheers > Holger