On Tue, Jul 4, 2023 at 2:34 PM Holger Hoffstätte <holger@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > On 2023-07-04 22:10, Suren Baghdasaryan wrote: > > The fix is posted at > > https://lore.kernel.org/all/20230704200656.2526715-1-surenb@xxxxxxxxxx/ > > CC'ing stable for inclusion into 6.4.y stable branch. > > > > Folks who reported the problem, could you please test and verify the fix? > > Thanks, > > Suren. > > I applied the fix and did a clean rebuild. The first attempt to boot resulted in > the following oops, though it kind of continued: > > Jul 4 22:35:22 hho kernel: Console: switching to colour frame buffer device 240x67 > Jul 4 22:35:22 hho kernel: amdgpu 0000:06:00.0: [drm] fb0: amdgpudrmfb frame buffer device > Jul 4 22:35:22 hho kernel: BUG: kernel NULL pointer dereference, address: 0000000000000052 > Jul 4 22:35:22 hho kernel: #PF: supervisor read access in kernel mode > Jul 4 22:35:22 hho kernel: #PF: error_code(0x0000) - not-present page > Jul 4 22:35:22 hho kernel: PGD 0 P4D 0 > Jul 4 22:35:22 hho kernel: Oops: 0000 [#1] SMP > Jul 4 22:35:22 hho kernel: CPU: 10 PID: 1740 Comm: start-stop-daem Not tainted 6.4.1 #1 > Jul 4 22:35:22 hho kernel: Hardware name: LENOVO 20U50001GE/20U50001GE, BIOS R19ET32W (1.16 ) 01/26/2021 > Jul 4 22:35:22 hho kernel: RIP: 0010:wq_worker_comm+0x63/0xc0 > Jul 4 22:35:22 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 7e 6b 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 29 b6 8b 00 80 7b > Jul 4 22:35:22 hho kernel: RSP: 0018:ffffc90000fb7bb8 EFLAGS: 00010202 > Jul 4 22:35:22 hho kernel: RAX: 000000000000000a RBX: ffff88810cd43300 RCX: 0001020304050608 > Jul 4 22:35:22 hho kernel: RDX: ffff88811395bfc0 RSI: 7fffffffffffffff RDI: ffff88810cd43300 > Jul 4 22:35:22 hho kernel: RBP: 000000000000000f R08: ffffc90000fb7be8 R09: 0000000000000040 > Jul 4 22:35:22 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90000fb7be8 > Jul 4 22:35:22 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001 > Jul 4 22:35:22 hho kernel: FS: 00007f39dde1c740(0000) GS:ffff8887ef680000(0000) knlGS:0000000000000000 > Jul 4 22:35:22 hho kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Jul 4 22:35:22 hho kernel: CR2: 0000000000000052 CR3: 0000000112188000 CR4: 0000000000350ee0 > Jul 4 22:35:22 hho kernel: Call Trace: > Jul 4 22:35:22 hho kernel: <TASK> > Jul 4 22:35:22 hho kernel: ? __die+0x1f/0x60 > Jul 4 22:35:22 hho kernel: ? page_fault_oops+0x14d/0x410 > Jul 4 22:35:22 hho kernel: ? xa_load+0x82/0xa0 > Jul 4 22:35:22 hho kernel: ? exc_page_fault+0x60/0x100 > Jul 4 22:35:22 hho kernel: ? asm_exc_page_fault+0x22/0x30 > Jul 4 22:35:22 hho kernel: ? wq_worker_comm+0x63/0xc0 > Jul 4 22:35:22 hho last message buffered 1 times > Jul 4 22:35:22 hho kernel: proc_task_name+0xa4/0xb0 > Jul 4 22:35:22 hho kernel: ? seq_put_decimal_ull_width+0x96/0x100 > Jul 4 22:35:22 hho kernel: do_task_stat+0x44b/0xe10 > Jul 4 22:35:22 hho kernel: proc_single_show+0x4b/0xa0 > Jul 4 22:35:22 hho kernel: seq_read_iter+0xff/0x410 > Jul 4 22:35:22 hho kernel: ? generic_fillattr+0x45/0xf0 > Jul 4 22:35:22 hho kernel: seq_read+0x93/0xb0 > Jul 4 22:35:22 hho kernel: vfs_read+0x9b/0x2c0 > Jul 4 22:35:22 hho kernel: ? __do_sys_newfstatat+0x22/0x30 > Jul 4 22:35:22 hho kernel: ksys_read+0x53/0xc0 > Jul 4 22:35:22 hho kernel: do_syscall_64+0x35/0x80 > Jul 4 22:35:22 hho kernel: entry_SYSCALL_64_after_hwframe+0x46/0xb0 > Jul 4 22:35:22 hho kernel: RIP: 0033:0x7f39ddf5877d > Jul 4 22:35:22 hho kernel: Code: b9 fe ff ff 48 8d 3d 1a 71 0a 00 50 e8 2c 12 02 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 80 3d 81 4c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83 > Jul 4 22:35:22 hho kernel: RSP: 002b:00007ffe4b98b6f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > Jul 4 22:35:22 hho kernel: RAX: ffffffffffffffda RBX: 00005655194cab40 RCX: 00007f39ddf5877d > Jul 4 22:35:22 hho kernel: RDX: 0000000000000400 RSI: 00005655194ccd30 RDI: 0000000000000004 > Jul 4 22:35:22 hho kernel: RBP: 00007ffe4b98b760 R08: 00007f39ddff8cb2 R09: 0000000000000001 > Jul 4 22:35:22 hho kernel: R10: 0000000000001000 R11: 0000000000000246 R12: 00007f39de0324a0 > Jul 4 22:35:22 hho kernel: R13: 00005655194cd140 R14: 0000000000000a68 R15: 00007f39de031ba0 > Jul 4 22:35:22 hho kernel: </TASK> > Jul 4 22:35:22 hho kernel: Modules linked in: mousedev sch_fq_codel bpf_preload snd_ctl_led amdgpu iwlmvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi mac80211 pkcs8_key_parser drm_ttm_helper ttm iommu_v2 gpu_sched snd_hda_intel libarc4 i2c_algo_bit snd_intel_dspcfg drm_buddy drm_suballoc_helper uvcvideo snd_hda_codec drm_display_helper edac_mce_amd videobuf2_vmalloc snd_hwdep crct10dif_pclmul videobuf2_memops uvc crc32_pclmul cec snd_hda_core crc32c_intel videobuf2_v4l2 ghash_clmulni_intel lm92 r8169 sha512_ssse3 snd_pcm videodev psmouse thinkpad_acpi iwlwifi drivetemp ledtrig_audio drm_kms_helper rapl videobuf2_common realtek snd_timer serio_raw snd_rn_pci_acp3x wmi_bmof platform_profile cfg80211 mc snd_acp_config k10temp snd syscopyarea mdio_devres ucsi_acpi snd_soc_acpi sysfillrect drm snd_pci_acp3x i2c_piix4 sysimgblt soundcore typec_ucsi ipmi_devintf rfkill roles libphy ipmi_msghandler typec video battery ac wmi i2c_scmi button > Jul 4 22:35:22 hho kernel: CR2: 0000000000000052 > Jul 4 22:35:22 hho kernel: ---[ end trace 0000000000000000 ]--- > Jul 4 22:35:22 hho kernel: RIP: 0010:wq_worker_comm+0x63/0xc0 > Jul 4 22:35:22 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 7e 6b 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 29 b6 8b 00 80 7b > Jul 4 22:35:22 hho kernel: RSP: 0018:ffffc90000fb7bb8 EFLAGS: 00010202 > Jul 4 22:35:22 hho kernel: RAX: 000000000000000a RBX: ffff88810cd43300 RCX: 0001020304050608 > Jul 4 22:35:22 hho kernel: RDX: ffff88811395bfc0 RSI: 7fffffffffffffff RDI: ffff88810cd43300 > Jul 4 22:35:22 hho kernel: RBP: 000000000000000f R08: ffffc90000fb7be8 R09: 0000000000000040 > Jul 4 22:35:22 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90000fb7be8 > Jul 4 22:35:22 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001 > Jul 4 22:35:22 hho kernel: FS: 00007f39dde1c740(0000) GS:ffff8887ef680000(0000) knlGS:0000000000000000 > Jul 4 22:35:22 hho kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Jul 4 22:35:22 hho kernel: CR2: 0000000000000052 CR3: 0000000112188000 CR4: 0000000000350ee0 > Jul 4 22:35:22 hho kernel: note: start-stop-daem[1740] exited with irqs disabled > Jul 4 22:35:22 hho kernel: Generic FE-GE Realtek PHY r8169-0-200:00: attached PHY driver (mii_bus:phy_addr=r8169-0-200:00, irq=MAC) > Jul 4 22:35:22 hho kernel: r8169 0000:02:00.0 eth0: Link is Down > Jul 4 22:35:24 hho kernel: r8169 0000:02:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx > Jul 4 22:35:24 hho kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > > It then kind of limped along until I rebooted again. This second attempt to boot > died and locked up completely, again during amdgpu initialization, and is on display here: > https://imgur.com/a/3ZE66kh > > Finally I just edited mm/Kconfig and set config PER_VMA_LOCK to "defbool n" to override > any setting in my old config. That made everything work again - it's what I'm using now. Now I'm completely confused... I've been running my system with this fix and collecting data the whole morning. Ok, I'll post a dependency on BROKEN in the evening and will see what this is all about. Thanks! > > Happy 4th and fireworks or whatever ¯\(ツ)/¯ > > cheers > Holger