Re: [PATCH 1/1] mm: disable CONFIG_PER_VMA_LOCK by default until its fixed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 4, 2023 at 4:59 PM Holger Hoffstätte
<holger@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> On 2023-07-05 00:42, Matthew Wilcox wrote:
> > On Tue, Jul 04, 2023 at 11:34:27PM +0200, Holger Hoffstätte wrote:
> >> I applied the fix and did a clean rebuild. The first attempt to boot resulted in
> >> the following oops, though it kind of continued:
> >
> > It would be helpful to run this through decode_stacktrace.sh
> >
> >> Jul  4 22:35:22 hho kernel: BUG: kernel NULL pointer dereference, address: 0000000000000052
> >> Jul  4 22:35:22 hho kernel: #PF: supervisor read access in kernel mode
> >> Jul  4 22:35:22 hho kernel: #PF: error_code(0x0000) - not-present page
> >> Jul  4 22:35:22 hho kernel: PGD 0 P4D 0
> >> Jul  4 22:35:22 hho kernel: Oops: 0000 [#1] SMP
> >> Jul  4 22:35:22 hho kernel: CPU: 10 PID: 1740 Comm: start-stop-daem Not tainted 6.4.1 #1
> >> Jul  4 22:35:22 hho kernel: Hardware name: LENOVO 20U50001GE/20U50001GE, BIOS R19ET32W (1.16 ) 01/26/2021
> >> Jul  4 22:35:22 hho kernel: RIP: 0010:wq_worker_comm+0x63/0xc0
> >> Jul  4 22:35:22 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 7e 6b 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 29 b6 8b 00 80 7b
> >
> > Faulting insn:
> >
> >     0:        4c 8b 70 48             mov    0x48(%rax),%r14
> >
> > and rax is 0xa, which matches up with 0x52 as the faulting address.
> >
> > I'm not sure this is related to the VMA patches.  It might be something
> > unrelated that doesn't often come up?
>
> See below for the reveal!
>
> >> Jul  4 22:35:22 hho kernel: RSP: 0018:ffffc90000fb7bb8 EFLAGS: 00010202
> >> Jul  4 22:35:22 hho kernel: RAX: 000000000000000a RBX: ffff88810cd43300 RCX: 0001020304050608
> >> Jul  4 22:35:22 hho kernel: RDX: ffff88811395bfc0 RSI: 7fffffffffffffff RDI: ffff88810cd43300
> >> Jul  4 22:35:22 hho kernel: RBP: 000000000000000f R08: ffffc90000fb7be8 R09: 0000000000000040
> >> Jul  4 22:35:22 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90000fb7be8
> >> Jul  4 22:35:22 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001
> >> Jul  4 22:35:22 hho kernel: FS:  00007f39dde1c740(0000) GS:ffff8887ef680000(0000) knlGS:0000000000000000
> >> Jul  4 22:35:22 hho kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> Jul  4 22:35:22 hho kernel: CR2: 0000000000000052 CR3: 0000000112188000 CR4: 0000000000350ee0
> >> Jul  4 22:35:22 hho kernel: Call Trace:
> >> Jul  4 22:35:22 hho kernel:  <TASK>
> >> Jul  4 22:35:22 hho kernel:  ? __die+0x1f/0x60
> >> Jul  4 22:35:22 hho kernel:  ? page_fault_oops+0x14d/0x410
> >> Jul  4 22:35:22 hho kernel:  ? xa_load+0x82/0xa0
> >> Jul  4 22:35:22 hho kernel:  ? exc_page_fault+0x60/0x100
> >> Jul  4 22:35:22 hho kernel:  ? asm_exc_page_fault+0x22/0x30
> >> Jul  4 22:35:22 hho kernel:  ? wq_worker_comm+0x63/0xc0
> >> Jul  4 22:35:22 hho last message buffered 1 times
> >> Jul  4 22:35:22 hho kernel:  proc_task_name+0xa4/0xb0
> >> Jul  4 22:35:22 hho kernel:  ? seq_put_decimal_ull_width+0x96/0x100
> >> Jul  4 22:35:22 hho kernel:  do_task_stat+0x44b/0xe10
> >> Jul  4 22:35:22 hho kernel:  proc_single_show+0x4b/0xa0
> >> Jul  4 22:35:22 hho kernel:  seq_read_iter+0xff/0x410
> >> Jul  4 22:35:22 hho kernel:  ? generic_fillattr+0x45/0xf0
> >> Jul  4 22:35:22 hho kernel:  seq_read+0x93/0xb0
> >> Jul  4 22:35:22 hho kernel:  vfs_read+0x9b/0x2c0
> >> Jul  4 22:35:22 hho kernel:  ? __do_sys_newfstatat+0x22/0x30
> >> Jul  4 22:35:22 hho kernel:  ksys_read+0x53/0xc0
> >> Jul  4 22:35:22 hho kernel:  do_syscall_64+0x35/0x80
> >> Jul  4 22:35:22 hho kernel:  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> >> Jul  4 22:35:22 hho kernel: RIP: 0033:0x7f39ddf5877d
> >> Jul  4 22:35:22 hho kernel: Code: b9 fe ff ff 48 8d 3d 1a 71 0a 00 50 e8 2c 12 02 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 80 3d 81 4c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83
> >> Jul  4 22:35:22 hho kernel: RSP: 002b:00007ffe4b98b6f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> >> Jul  4 22:35:22 hho kernel: RAX: ffffffffffffffda RBX: 00005655194cab40 RCX: 00007f39ddf5877d
> >> Jul  4 22:35:22 hho kernel: RDX: 0000000000000400 RSI: 00005655194ccd30 RDI: 0000000000000004
> >> Jul  4 22:35:22 hho kernel: RBP: 00007ffe4b98b760 R08: 00007f39ddff8cb2 R09: 0000000000000001
> >> Jul  4 22:35:22 hho kernel: R10: 0000000000001000 R11: 0000000000000246 R12: 00007f39de0324a0
> >> Jul  4 22:35:22 hho kernel: R13: 00005655194cd140 R14: 0000000000000a68 R15: 00007f39de031ba0
> >> Jul  4 22:35:22 hho kernel:  </TASK>
> >> Jul  4 22:35:22 hho kernel: Modules linked in: mousedev sch_fq_codel bpf_preload snd_ctl_led amdgpu iwlmvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi mac80211 pkcs8_key_parser drm_ttm_helper ttm iommu_v2 gpu_sched snd_hda_intel libarc4 i2c_algo_bit snd_intel_dspcfg drm_buddy drm_suballoc_helper uvcvideo snd_hda_codec drm_display_helper edac_mce_amd videobuf2_vmalloc snd_hwdep crct10dif_pclmul videobuf2_memops uvc crc32_pclmul cec snd_hda_core crc32c_intel videobuf2_v4l2 ghash_clmulni_intel lm92 r8169 sha512_ssse3 snd_pcm videodev psmouse thinkpad_acpi iwlwifi drivetemp ledtrig_audio drm_kms_helper rapl videobuf2_common realtek snd_timer serio_raw snd_rn_pci_acp3x wmi_bmof platform_profile cfg80211 mc snd_acp_config k10temp snd syscopyarea mdio_devres ucsi_acpi snd_soc_acpi sysfillrect drm snd_pci_acp3x i2c_piix4 sysimgblt soundcore typec_ucsi ipmi_devintf rfkill roles libphy ipmi_msghandler typec video battery ac wmi i2c_scmi button
> >> Jul  4 22:35:22 hho kernel: CR2: 0000000000000052
> >> Jul  4 22:35:22 hho kernel: ---[ end trace 0000000000000000 ]---
> >> Jul  4 22:35:22 hho kernel: RIP: 0010:wq_worker_comm+0x63/0xc0
> >> Jul  4 22:35:22 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 7e 6b 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 29 b6 8b 00 80 7b
> >> Jul  4 22:35:22 hho kernel: RSP: 0018:ffffc90000fb7bb8 EFLAGS: 00010202
> >> Jul  4 22:35:22 hho kernel: RAX: 000000000000000a RBX: ffff88810cd43300 RCX: 0001020304050608
> >> Jul  4 22:35:22 hho kernel: RDX: ffff88811395bfc0 RSI: 7fffffffffffffff RDI: ffff88810cd43300
> >> Jul  4 22:35:22 hho kernel: RBP: 000000000000000f R08: ffffc90000fb7be8 R09: 0000000000000040
> >> Jul  4 22:35:22 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90000fb7be8
> >> Jul  4 22:35:22 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001
> >> Jul  4 22:35:22 hho kernel: FS:  00007f39dde1c740(0000) GS:ffff8887ef680000(0000) knlGS:0000000000000000
> >> Jul  4 22:35:22 hho kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> Jul  4 22:35:22 hho kernel: CR2: 0000000000000052 CR3: 0000000112188000 CR4: 0000000000350ee0
> >> Jul  4 22:35:22 hho kernel: note: start-stop-daem[1740] exited with irqs disabled
> >> Jul  4 22:35:22 hho kernel: Generic FE-GE Realtek PHY r8169-0-200:00: attached PHY driver (mii_bus:phy_addr=r8169-0-200:00, irq=MAC)
> >> Jul  4 22:35:22 hho kernel: r8169 0000:02:00.0 eth0: Link is Down
> >> Jul  4 22:35:24 hho kernel: r8169 0000:02:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx
> >> Jul  4 22:35:24 hho kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> >>
> >> It then kind of limped along until I rebooted again. This second attempt to boot
> >> died and locked up completely, again during amdgpu initialization, and is on display here:
> >> https://imgur.com/a/3ZE66kh
> >
> > refill_obj_stock() is also somewhat unrelated to VMA stuff.  This is
> > all very bizarre.
> >
> >> Finally I just edited mm/Kconfig and set config PER_VMA_LOCK to "defbool n" to override
> >> any setting in my old config. That made everything work again - it's what I'm using now.
> >
> > Could I ask you to try a few boots with PER_VMA_LOCK set to "n", just
> > to eliminate the possibility that this is a coincidence?
> >
>
> HOLY SMOKES! You are on to something! I wanted to do 10 reboots and didn't expect
> anything to happen since this has been working fine since forever, and I don't boot
> that often since suspend is quite reliable these days. It did 9 without problems and
> then on the 10th reboot it crapped out, again with the xa_load pagefault.

Ok, sounds like the results of the fix are inconclusive. I guess we
should wait for more testing before concluding whether the fix is
valid.
In the meantime, per Andrew's request, I posted the patchset that
includes both the fix and the proper kill switch of the feature at
https://lore.kernel.org/all/20230705063711.2670599-1-surenb@xxxxxxxxxx/.
Thanks,
Suren.

>
> Here's the first trace:
>
> holger>/tmp/linux-6.4.1/scripts/decode_stacktrace.sh /boot/kernel-genkernel-x86_64-6.4.1 < /tmp/kern.log
> Jul  4 22:35:22 hho kernel: [drm] Initialized amdgpu 3.52.0 20150101 for 0000:06:00.0 on minor 0
> Jul  4 22:35:22 hho kernel: fbcon: amdgpudrmfb (fb0) is primary device
> Jul  4 22:35:22 hho kernel: [drm] DSC precompute is not needed.
> Jul  4 22:35:22 hho kernel: Console: switching to colour frame buffer device 240x67
> Jul  4 22:35:22 hho kernel: amdgpu 0000:06:00.0: [drm] fb0: amdgpudrmfb frame buffer device
> Jul  4 22:35:22 hho kernel: BUG: kernel NULL pointer dereference, address: 0000000000000052
> Jul  4 22:35:22 hho kernel: #PF: supervisor read access in kernel mode
> Jul  4 22:35:22 hho kernel: #PF: error_code(0x0000) - not-present page
> Jul  4 22:35:22 hho kernel: PGD 0 P4D 0
> Jul  4 22:35:22 hho kernel: Oops: 0000 [#1] SMP
> Jul  4 22:35:22 hho kernel: CPU: 10 PID: 1740 Comm: start-stop-daem Not tainted 6.4.1 #1
> Jul  4 22:35:22 hho kernel: Hardware name: LENOVO 20U50001GE/20U50001GE, BIOS R19ET32W (1.16 ) 01/26/2021
> Jul 4 22:35:22 hho kernel: RIP: wq_worker_comm+0x63/0xc0
> Jul 4 22:35:22 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 7e 6b 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 29 b6 8b 00 80 7b
> All code
> ========
>     0:  43 2c 20                rex.XB sub $0x20,%al
>     3:  75 1d                   jne    0x22
>     5:  5b                      pop    %rbx
>     6:  5d                      pop    %rbp
>     7:  48 c7 c7 e0 a4 43 82    mov    $0xffffffff8243a4e0,%rdi
>     e:  41 5c                   pop    %r12
>    10:  41 5d                   pop    %r13
>    12:  41 5e                   pop    %r14
>    14:  e9 7e 6b 8b 00          jmp    0x8b6b97
>    19:  5b                      pop    %rbx
>    1a:  5d                      pop    %rbp
>    1b:  41 5c                   pop    %r12
>    1d:  41 5d                   pop    %r13
>    1f:  41 5e                   pop    %r14
>    21:  c3                      ret
>    22:  48 89 df                mov    %rbx,%rdi
>    25:  e8 ad 35 00 00          call   0x35d7
>    2a:* 4c 8b 70 48             mov    0x48(%rax),%r14          <-- trapping instruction
>    2e:  48 89 c3                mov    %rax,%rbx
>    31:  4d 85 f6                test   %r14,%r14
>    34:  74 cf                   je     0x5
>    36:  4c 89 f7                mov    %r14,%rdi
>    39:  e8 29 b6 8b 00          call   0x8bb667
>    3e:  80                      .byte 0x80
>    3f:  7b                      .byte 0x7b
>
> Code starting with the faulting instruction
> ===========================================
>     0:  4c 8b 70 48             mov    0x48(%rax),%r14
>     4:  48 89 c3                mov    %rax,%rbx
>     7:  4d 85 f6                test   %r14,%r14
>     a:  74 cf                   je     0xffffffffffffffdb
>     c:  4c 89 f7                mov    %r14,%rdi
>     f:  e8 29 b6 8b 00          call   0x8bb63d
>    14:  80                      .byte 0x80
>    15:  7b                      .byte 0x7b
> Jul  4 22:35:22 hho kernel: RSP: 0018:ffffc90000fb7bb8 EFLAGS: 00010202
> Jul  4 22:35:22 hho kernel: RAX: 000000000000000a RBX: ffff88810cd43300 RCX: 0001020304050608
> Jul  4 22:35:22 hho kernel: RDX: ffff88811395bfc0 RSI: 7fffffffffffffff RDI: ffff88810cd43300
> Jul  4 22:35:22 hho kernel: RBP: 000000000000000f R08: ffffc90000fb7be8 R09: 0000000000000040
> Jul  4 22:35:22 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90000fb7be8
> Jul  4 22:35:22 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001
> Jul  4 22:35:22 hho kernel: FS:  00007f39dde1c740(0000) GS:ffff8887ef680000(0000) knlGS:0000000000000000
> Jul  4 22:35:22 hho kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul  4 22:35:22 hho kernel: CR2: 0000000000000052 CR3: 0000000112188000 CR4: 0000000000350ee0
> Jul  4 22:35:22 hho kernel: Call Trace:
> Jul  4 22:35:22 hho kernel:  <TASK>
> Jul 4 22:35:22 hho kernel: ? __die+0x1f/0x60
> Jul 4 22:35:22 hho kernel: ? page_fault_oops+0x14d/0x410
> Jul 4 22:35:22 hho kernel: ? xa_load+0x82/0xa0
> Jul 4 22:35:22 hho kernel: ? exc_page_fault+0x60/0x100
> Jul 4 22:35:22 hho kernel: ? asm_exc_page_fault+0x22/0x30
> Jul 4 22:35:22 hho kernel: ? wq_worker_comm+0x63/0xc0
> Jul  4 22:35:22 hho last message buffered 1 times
> Jul 4 22:35:22 hho kernel: proc_task_name+0xa4/0xb0
> Jul 4 22:35:22 hho kernel: ? seq_put_decimal_ull_width+0x96/0x100
> Jul 4 22:35:22 hho kernel: do_task_stat+0x44b/0xe10
> Jul 4 22:35:22 hho kernel: proc_single_show+0x4b/0xa0
> Jul 4 22:35:22 hho kernel: seq_read_iter+0xff/0x410
> Jul 4 22:35:22 hho kernel: ? generic_fillattr+0x45/0xf0
> Jul 4 22:35:22 hho kernel: seq_read+0x93/0xb0
> Jul 4 22:35:22 hho kernel: vfs_read+0x9b/0x2c0
> Jul 4 22:35:22 hho kernel: ? __do_sys_newfstatat+0x22/0x30
> Jul 4 22:35:22 hho kernel: ksys_read+0x53/0xc0
> Jul 4 22:35:22 hho kernel: do_syscall_64+0x35/0x80
> Jul 4 22:35:22 hho kernel: entry_SYSCALL_64_after_hwframe+0x46/0xb0
> Jul  4 22:35:22 hho kernel: RIP: 0033:0x7f39ddf5877d
> Jul 4 22:35:22 hho kernel: Code: b9 fe ff ff 48 8d 3d 1a 71 0a 00 50 e8 2c 12 02 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 80 3d 81 4c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83
> All code
> ========
>     0:  b9 fe ff ff 48          mov    $0x48fffffe,%ecx
>     5:  8d 3d 1a 71 0a 00       lea    0xa711a(%rip),%edi        # 0xa7125
>     b:  50                      push   %rax
>     c:  e8 2c 12 02 00          call   0x2123d
>    11:  66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
>    18:  00 00 00
>    1b:  66 90                   xchg   %ax,%ax
>    1d:  80 3d 81 4c 0e 00 00    cmpb   $0x0,0xe4c81(%rip)        # 0xe4ca5
>    24:  74 17                   je     0x3d
>    26:  31 c0                   xor    %eax,%eax
>    28:  0f 05                   syscall
>    2a:* 48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax         <-- trapping instruction
>    30:  77 5b                   ja     0x8d
>    32:  c3                      ret
>    33:  66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
>    3a:  00 00 00
>    3d:  53                      push   %rbx
>    3e:  48                      rex.W
>    3f:  83                      .byte 0x83
>
> Code starting with the faulting instruction
> ===========================================
>     0:  48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
>     6:  77 5b                   ja     0x63
>     8:  c3                      ret
>     9:  66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
>    10:  00 00 00
>    13:  53                      push   %rbx
>    14:  48                      rex.W
>    15:  83                      .byte 0x83
> Jul  4 22:35:22 hho kernel: RSP: 002b:00007ffe4b98b6f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> Jul  4 22:35:22 hho kernel: RAX: ffffffffffffffda RBX: 00005655194cab40 RCX: 00007f39ddf5877d
> Jul  4 22:35:22 hho kernel: RDX: 0000000000000400 RSI: 00005655194ccd30 RDI: 0000000000000004
> Jul  4 22:35:22 hho kernel: RBP: 00007ffe4b98b760 R08: 00007f39ddff8cb2 R09: 0000000000000001
> Jul  4 22:35:22 hho kernel: R10: 0000000000001000 R11: 0000000000000246 R12: 00007f39de0324a0
> Jul  4 22:35:22 hho kernel: R13: 00005655194cd140 R14: 0000000000000a68 R15: 00007f39de031ba0
> Jul  4 22:35:22 hho kernel:  </TASK>
> Jul  4 22:35:22 hho kernel: Modules linked in: mousedev sch_fq_codel bpf_preload snd_ctl_led amdgpu iwlmvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi mac80211 pkcs8_key_parser drm_ttm_helper ttm iommu_v2 gpu_sched snd_hda_intel libarc4 i2c_algo_bit snd_intel_dspcfg drm_buddy drm_suballoc_helper uvcvideo snd_hda_codec drm_display_helper edac_mce_amd videobuf2_vmalloc snd_hwdep crct10dif_pclmul videobuf2_memops uvc crc32_pclmul cec snd_hda_core crc32c_intel videobuf2_v4l2 ghash_clmulni_intel lm92 r8169 sha512_ssse3 snd_pcm videodev psmouse thinkpad_acpi iwlwifi drivetemp ledtrig_audio drm_kms_helper rapl videobuf2_common realtek snd_timer serio_raw snd_rn_pci_acp3x wmi_bmof platform_profile cfg80211 mc snd_acp_config k10temp snd syscopyarea mdio_devres ucsi_acpi snd_soc_acpi sysfillrect drm snd_pci_acp3x i2c_piix4 sysimgblt soundcore typec_ucsi ipmi_devintf rfkill roles libphy ipmi_msghandler typec video battery ac wmi i2c_scmi button
> Jul  4 22:35:22 hho kernel: CR2: 0000000000000052
> Jul  4 22:35:22 hho kernel: ---[ end trace 0000000000000000 ]---
> Jul 4 22:35:22 hho kernel: RIP: wq_worker_comm+0x63/0xc0
> Jul 4 22:35:22 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 7e 6b 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 29 b6 8b 00 80 7b
> All code
> ========
>     0:  43 2c 20                rex.XB sub $0x20,%al
>     3:  75 1d                   jne    0x22
>     5:  5b                      pop    %rbx
>     6:  5d                      pop    %rbp
>     7:  48 c7 c7 e0 a4 43 82    mov    $0xffffffff8243a4e0,%rdi
>     e:  41 5c                   pop    %r12
>    10:  41 5d                   pop    %r13
>    12:  41 5e                   pop    %r14
>    14:  e9 7e 6b 8b 00          jmp    0x8b6b97
>    19:  5b                      pop    %rbx
>    1a:  5d                      pop    %rbp
>    1b:  41 5c                   pop    %r12
>    1d:  41 5d                   pop    %r13
>    1f:  41 5e                   pop    %r14
>    21:  c3                      ret
>    22:  48 89 df                mov    %rbx,%rdi
>    25:  e8 ad 35 00 00          call   0x35d7
>    2a:* 4c 8b 70 48             mov    0x48(%rax),%r14          <-- trapping instruction
>    2e:  48 89 c3                mov    %rax,%rbx
>    31:  4d 85 f6                test   %r14,%r14
>    34:  74 cf                   je     0x5
>    36:  4c 89 f7                mov    %r14,%rdi
>    39:  e8 29 b6 8b 00          call   0x8bb667
>    3e:  80                      .byte 0x80
>    3f:  7b                      .byte 0x7b
>
> Code starting with the faulting instruction
> ===========================================
>     0:  4c 8b 70 48             mov    0x48(%rax),%r14
>     4:  48 89 c3                mov    %rax,%rbx
>     7:  4d 85 f6                test   %r14,%r14
>     a:  74 cf                   je     0xffffffffffffffdb
>     c:  4c 89 f7                mov    %r14,%rdi
>     f:  e8 29 b6 8b 00          call   0x8bb63d
>    14:  80                      .byte 0x80
>    15:  7b                      .byte 0x7b
> Jul  4 22:35:22 hho kernel: RSP: 0018:ffffc90000fb7bb8 EFLAGS: 00010202
> Jul  4 22:35:22 hho kernel: RAX: 000000000000000a RBX: ffff88810cd43300 RCX: 0001020304050608
> Jul  4 22:35:22 hho kernel: RDX: ffff88811395bfc0 RSI: 7fffffffffffffff RDI: ffff88810cd43300
> Jul  4 22:35:22 hho kernel: RBP: 000000000000000f R08: ffffc90000fb7be8 R09: 0000000000000040
> Jul  4 22:35:22 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90000fb7be8
> Jul  4 22:35:22 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001
> Jul  4 22:35:22 hho kernel: FS:  00007f39dde1c740(0000) GS:ffff8887ef680000(0000) knlGS:0000000000000000
> Jul  4 22:35:22 hho kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul  4 22:35:22 hho kernel: CR2: 0000000000000052 CR3: 0000000112188000 CR4: 0000000000350ee0
> Jul  4 22:35:22 hho kernel: note: start-stop-daem[1740] exited with irqs disabled
> Jul  4 22:35:22 hho kernel: Generic FE-GE Realtek PHY r8169-0-200:00: attached PHY driver (mii_bus:phy_addr=r8169-0-200:00, irq=MAC)
> Jul  4 22:35:22 hho kernel: r8169 0000:02:00.0 eth0: Link is Down
> Jul  4 22:35:24 hho kernel: r8169 0000:02:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx
> Jul  4 22:35:24 hho kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>
> Here is the second one from the reboot bonanza:
>
> holger>/tmp/linux-6.4.1/scripts/decode_stacktrace.sh /boot/kernel-genkernel-x86_64-6.4.1 < /tmp/kern.log
> Jul  5 01:34:20 hho kernel: [drm] Initialized amdgpu 3.52.0 20150101 for 0000:06:00.0 on minor 0
> Jul  5 01:34:20 hho kernel: fbcon: amdgpudrmfb (fb0) is primary device
> Jul  5 01:34:20 hho kernel: [drm] DSC precompute is not needed.
> Jul  5 01:34:20 hho kernel: Console: switching to colour frame buffer device 240x67
> Jul  5 01:34:20 hho kernel: amdgpu 0000:06:00.0: [drm] fb0: amdgpudrmfb frame buffer device
> Jul  5 01:34:20 hho kernel: BUG: kernel NULL pointer dereference, address: 0000000000000052
> Jul  5 01:34:20 hho kernel: #PF: supervisor read access in kernel mode
> Jul  5 01:34:20 hho kernel: #PF: error_code(0x0000) - not-present page
> Jul  5 01:34:20 hho kernel: PGD 0 P4D 0
> Jul  5 01:34:20 hho kernel: Oops: 0000 [#1] SMP
> Jul  5 01:34:20 hho kernel: CPU: 8 PID: 1716 Comm: start-stop-daem Not tainted 6.4.1 #1
> Jul  5 01:34:20 hho kernel: Hardware name: LENOVO 20U50001GE/20U50001GE, BIOS R19ET32W (1.16 ) 01/26/2021
> Jul 5 01:34:20 hho kernel: RIP: wq_worker_comm+0x63/0xc0
> Jul 5 01:34:20 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 2e 59 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 d9 a3 8b 00 80 7b
> All code
> ========
>     0:  43 2c 20                rex.XB sub $0x20,%al
>     3:  75 1d                   jne    0x22
>     5:  5b                      pop    %rbx
>     6:  5d                      pop    %rbp
>     7:  48 c7 c7 e0 a4 43 82    mov    $0xffffffff8243a4e0,%rdi
>     e:  41 5c                   pop    %r12
>    10:  41 5d                   pop    %r13
>    12:  41 5e                   pop    %r14
>    14:  e9 2e 59 8b 00          jmp    0x8b5947
>    19:  5b                      pop    %rbx
>    1a:  5d                      pop    %rbp
>    1b:  41 5c                   pop    %r12
>    1d:  41 5d                   pop    %r13
>    1f:  41 5e                   pop    %r14
>    21:  c3                      ret
>    22:  48 89 df                mov    %rbx,%rdi
>    25:  e8 ad 35 00 00          call   0x35d7
>    2a:* 4c 8b 70 48             mov    0x48(%rax),%r14          <-- trapping instruction
>    2e:  48 89 c3                mov    %rax,%rbx
>    31:  4d 85 f6                test   %r14,%r14
>    34:  74 cf                   je     0x5
>    36:  4c 89 f7                mov    %r14,%rdi
>    39:  e8 d9 a3 8b 00          call   0x8ba417
>    3e:  80                      .byte 0x80
>    3f:  7b                      .byte 0x7b
>
> Code starting with the faulting instruction
> ===========================================
>     0:  4c 8b 70 48             mov    0x48(%rax),%r14
>     4:  48 89 c3                mov    %rax,%rbx
>     7:  4d 85 f6                test   %r14,%r14
>     a:  74 cf                   je     0xffffffffffffffdb
>     c:  4c 89 f7                mov    %r14,%rdi
>     f:  e8 d9 a3 8b 00          call   0x8ba3ed
>    14:  80                      .byte 0x80
>    15:  7b                      .byte 0x7b
> Jul  5 01:34:20 hho kernel: RSP: 0018:ffffc90001027bb8 EFLAGS: 00010202
> Jul  5 01:34:20 hho kernel: RAX: 000000000000000a RBX: ffff888111052640 RCX: 0001020304050608
> Jul  5 01:34:20 hho kernel: RDX: ffff88810490b300 RSI: 7fffffffffffffff RDI: ffff888111052640
> Jul  5 01:34:20 hho kernel: RBP: 000000000000000f R08: ffffc90001027be8 R09: 0000000000000040
> Jul  5 01:34:20 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90001027be8
> Jul  5 01:34:20 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001
> Jul  5 01:34:20 hho kernel: FS:  00007f917809a740(0000) GS:ffff8887ef600000(0000) knlGS:0000000000000000
> Jul  5 01:34:20 hho kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul  5 01:34:20 hho kernel: CR2: 0000000000000052 CR3: 0000000107562000 CR4: 0000000000350ee0
> Jul  5 01:34:20 hho kernel: Call Trace:
> Jul  5 01:34:20 hho kernel:  <TASK>
> Jul 5 01:34:20 hho kernel: ? __die+0x1f/0x60
> Jul 5 01:34:20 hho kernel: ? page_fault_oops+0x14d/0x410
> Jul 5 01:34:20 hho kernel: ? xa_load+0x82/0xa0
> Jul  5 01:34:20 hho last message buffered 1 times
> Jul 5 01:34:20 hho kernel: ? exc_page_fault+0x60/0x100
> Jul 5 01:34:20 hho kernel: ? asm_exc_page_fault+0x22/0x30
> Jul 5 01:34:20 hho kernel: ? wq_worker_comm+0x63/0xc0
> Jul  5 01:34:20 hho last message buffered 1 times
> Jul 5 01:34:20 hho kernel: proc_task_name+0xa4/0xb0
> Jul 5 01:34:20 hho kernel: ? seq_put_decimal_ull_width+0x96/0x100
> Jul 5 01:34:20 hho kernel: do_task_stat+0x44b/0xe10
> Jul 5 01:34:20 hho kernel: proc_single_show+0x4b/0xa0
> Jul 5 01:34:20 hho kernel: seq_read_iter+0xff/0x410
> Jul 5 01:34:20 hho kernel: ? generic_fillattr+0x45/0xf0
> Jul 5 01:34:20 hho kernel: seq_read+0x93/0xb0
> Jul 5 01:34:20 hho kernel: vfs_read+0x9b/0x2c0
> Jul 5 01:34:20 hho kernel: ? __do_sys_newfstatat+0x22/0x30
> Jul 5 01:34:20 hho kernel: ksys_read+0x53/0xc0
> Jul 5 01:34:20 hho kernel: do_syscall_64+0x35/0x80
> Jul 5 01:34:20 hho kernel: entry_SYSCALL_64_after_hwframe+0x46/0xb0
> Jul  5 01:34:20 hho kernel: RIP: 0033:0x7f91781d677d
> Jul 5 01:34:20 hho kernel: Code: b9 fe ff ff 48 8d 3d 1a 71 0a 00 50 e8 2c 12 02 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 80 3d 81 4c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83
> All code
> ========
>     0:  b9 fe ff ff 48          mov    $0x48fffffe,%ecx
>     5:  8d 3d 1a 71 0a 00       lea    0xa711a(%rip),%edi        # 0xa7125
>     b:  50                      push   %rax
>     c:  e8 2c 12 02 00          call   0x2123d
>    11:  66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
>    18:  00 00 00
>    1b:  66 90                   xchg   %ax,%ax
>    1d:  80 3d 81 4c 0e 00 00    cmpb   $0x0,0xe4c81(%rip)        # 0xe4ca5
>    24:  74 17                   je     0x3d
>    26:  31 c0                   xor    %eax,%eax
>    28:  0f 05                   syscall
>    2a:* 48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax         <-- trapping instruction
>    30:  77 5b                   ja     0x8d
>    32:  c3                      ret
>    33:  66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
>    3a:  00 00 00
>    3d:  53                      push   %rbx
>    3e:  48                      rex.W
>    3f:  83                      .byte 0x83
>
> Code starting with the faulting instruction
> ===========================================
>     0:  48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
>     6:  77 5b                   ja     0x63
>     8:  c3                      ret
>     9:  66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
>    10:  00 00 00
>    13:  53                      push   %rbx
>    14:  48                      rex.W
>    15:  83                      .byte 0x83
> Jul  5 01:34:20 hho kernel: RSP: 002b:00007ffe56a8adb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> Jul  5 01:34:20 hho kernel: RAX: ffffffffffffffda RBX: 0000559458207b40 RCX: 00007f91781d677d
> Jul  5 01:34:20 hho kernel: RDX: 0000000000000400 RSI: 0000559458209d30 RDI: 0000000000000004
> Jul  5 01:34:20 hho kernel: RBP: 00007ffe56a8ae20 R08: 00007f9178276cb2 R09: 0000000000000001
> Jul  5 01:34:20 hho kernel: R10: 0000000000001000 R11: 0000000000000246 R12: 00007f91782b04a0
> Jul  5 01:34:20 hho kernel: R13: 000055945820a140 R14: 0000000000000a68 R15: 00007f91782afba0
> Jul  5 01:34:20 hho kernel:  </TASK>
> Jul  5 01:34:20 hho kernel: Modules linked in: sch_fq_codel bpf_preload mousedev snd_ctl_led iwlmvm snd_hda_codec_realtek amdgpu pkcs8_key_parser snd_hda_codec_generic mac80211 libarc4 drm_ttm_helper snd_hda_codec_hdmi ttm iommu_v2 uvcvideo gpu_sched videobuf2_vmalloc i2c_algo_bit videobuf2_memops snd_hda_intel drm_buddy uvc edac_mce_amd snd_intel_dspcfg crct10dif_pclmul videobuf2_v4l2 drm_suballoc_helper crc32_pclmul lm92 snd_hda_codec drm_display_helper crc32c_intel videodev snd_hwdep ghash_clmulni_intel r8169 drivetemp cec sha512_ssse3 thinkpad_acpi snd_hda_core videobuf2_common psmouse realtek iwlwifi drm_kms_helper rapl ledtrig_audio snd_pcm mc serio_raw snd_rn_pci_acp3x platform_profile syscopyarea wmi_bmof mdio_devres k10temp ipmi_devintf snd_timer snd_acp_config sysfillrect cfg80211 drm ucsi_acpi sysimgblt snd snd_soc_acpi libphy i2c_piix4 ipmi_msghandler snd_pci_acp3x typec_ucsi soundcore rfkill video roles typec battery ac wmi i2c_scmi button
> Jul  5 01:34:20 hho kernel: CR2: 0000000000000052
> Jul  5 01:34:20 hho kernel: ---[ end trace 0000000000000000 ]---
> Jul 5 01:34:20 hho kernel: RIP: wq_worker_comm+0x63/0xc0
> Jul 5 01:34:20 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 2e 59 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 d9 a3 8b 00 80 7b
> All code
> ========
>     0:  43 2c 20                rex.XB sub $0x20,%al
>     3:  75 1d                   jne    0x22
>     5:  5b                      pop    %rbx
>     6:  5d                      pop    %rbp
>     7:  48 c7 c7 e0 a4 43 82    mov    $0xffffffff8243a4e0,%rdi
>     e:  41 5c                   pop    %r12
>    10:  41 5d                   pop    %r13
>    12:  41 5e                   pop    %r14
>    14:  e9 2e 59 8b 00          jmp    0x8b5947
>    19:  5b                      pop    %rbx
>    1a:  5d                      pop    %rbp
>    1b:  41 5c                   pop    %r12
>    1d:  41 5d                   pop    %r13
>    1f:  41 5e                   pop    %r14
>    21:  c3                      ret
>    22:  48 89 df                mov    %rbx,%rdi
>    25:  e8 ad 35 00 00          call   0x35d7
>    2a:* 4c 8b 70 48             mov    0x48(%rax),%r14          <-- trapping instruction
>    2e:  48 89 c3                mov    %rax,%rbx
>    31:  4d 85 f6                test   %r14,%r14
>    34:  74 cf                   je     0x5
>    36:  4c 89 f7                mov    %r14,%rdi
>    39:  e8 d9 a3 8b 00          call   0x8ba417
>    3e:  80                      .byte 0x80
>    3f:  7b                      .byte 0x7b
>
> Code starting with the faulting instruction
> ===========================================
>     0:  4c 8b 70 48             mov    0x48(%rax),%r14
>     4:  48 89 c3                mov    %rax,%rbx
>     7:  4d 85 f6                test   %r14,%r14
>     a:  74 cf                   je     0xffffffffffffffdb
>     c:  4c 89 f7                mov    %r14,%rdi
>     f:  e8 d9 a3 8b 00          call   0x8ba3ed
>    14:  80                      .byte 0x80
>    15:  7b                      .byte 0x7b
> Jul  5 01:34:20 hho kernel: RSP: 0018:ffffc90001027bb8 EFLAGS: 00010202
> Jul  5 01:34:20 hho kernel: RAX: 000000000000000a RBX: ffff888111052640 RCX: 0001020304050608
> Jul  5 01:34:20 hho kernel: RDX: ffff88810490b300 RSI: 7fffffffffffffff RDI: ffff888111052640
> Jul  5 01:34:20 hho kernel: RBP: 000000000000000f R08: ffffc90001027be8 R09: 0000000000000040
> Jul  5 01:34:20 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90001027be8
> Jul  5 01:34:20 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001
> Jul  5 01:34:20 hho kernel: FS:  00007f917809a740(0000) GS:ffff8887ef600000(0000) knlGS:0000000000000000
> Jul  5 01:34:20 hho kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul  5 01:34:20 hho kernel: CR2: 0000000000000052 CR3: 0000000107562000 CR4: 0000000000350ee0
> Jul  5 01:34:20 hho kernel: note: start-stop-daem[1716] exited with irqs disabled
> Jul  5 01:34:20 hho kernel: Generic FE-GE Realtek PHY r8169-0-200:00: attached PHY driver (mii_bus:phy_addr=r8169-0-200:00, irq=MAC)
> Jul  5 01:34:21 hho kernel: r8169 0000:02:00.0 eth0: Link is Down
> Jul  5 01:34:23 hho kernel: r8169 0000:02:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx
> Jul  5 01:34:23 hho kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>
> The crashing process was openrc's start-stop-daemon starting acpid, though I think
> both are just the victims here.
>
> Hope this helps!
>
> cheers
> Holger





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux