Hi guys. Between commits fdaf9a5840ac and babf0bb978e3 GPU stopped entering in graphic mode instead I see black screen with constantly glowing cursor. Demonstration: https://youtu.be/rGL4LsHMae4 In the kernel logs there are references to hung processes: [ 149.363465] rfkill: input handler disabled [ 249.072478] INFO: task (brt-dbus):1645 blocked for more than 122 seconds. [ 249.072515] Tainted: G W L -------- --- 5.19.0-0.rc0.20220526gitbabf0bb978e3.4.fc37.x86_64 #1 [ 249.072520] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 249.072524] task:(brt-dbus) state:D stack:14384 pid: 1645 ppid: 1 flags:0x00000002 [ 249.072536] Call Trace: [ 249.072540] <TASK> [ 249.072551] __schedule+0x492/0x1640 [ 249.072560] ? lock_is_held_type+0xe8/0x140 [ 249.072569] ? find_held_lock+0x32/0x80 [ 249.072584] schedule+0x4e/0xb0 [ 249.072591] schedule_preempt_disabled+0x14/0x20 [ 249.072597] __mutex_lock+0x423/0x890 [ 249.072608] ? amdgpu_ctx_mgr_entity_flush+0x32/0xc0 [amdgpu] [ 249.072818] ? amdgpu_ctx_mgr_entity_flush+0x32/0xc0 [amdgpu] [ 249.073010] amdgpu_ctx_mgr_entity_flush+0x32/0xc0 [amdgpu] [ 249.073207] amdgpu_flush+0x25/0x40 [amdgpu] [ 249.074088] filp_close+0x31/0x70 [ 249.074097] __close_range+0x130/0x320 [ 249.074108] __x64_sys_close_range+0x13/0x20 [ 249.074113] do_syscall_64+0x5b/0x80 [ 249.074120] ? lockdep_hardirqs_on+0x7d/0x100 [ 249.074127] ? do_syscall_64+0x67/0x80 [ 249.074135] ? do_syscall_64+0x67/0x80 [ 249.074140] ? lockdep_hardirqs_on+0x7d/0x100 [ 249.074147] ? do_syscall_64+0x67/0x80 [ 249.074154] ? lock_is_held_type+0xe8/0x140 [ 249.074164] ? asm_exc_page_fault+0x27/0x30 [ 249.074171] ? lockdep_hardirqs_on+0x7d/0x100 [ 249.074178] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 249.074184] RIP: 0033:0x7fd71f54f97b [ 249.074208] RSP: 002b:00007fffc8e752a8 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4 [ 249.074215] RAX: ffffffffffffffda RBX: 00007fffc8e752b0 RCX: 00007fd71f54f97b [ 249.074220] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000027 [ 249.074224] RBP: 00007fffc8e75330 R08: 0000000000000000 R09: 00007fffc8e75380 [ 249.074228] R10: 00007fffc8e751f0 R11: 0000000000000246 R12: 0000000000000002 [ 249.074232] R13: 00007fffc8e75340 R14: 0000000000000000 R15: 0000000000000002 [ 249.074252] </TASK> [ 249.074261] INFO: task (ostnamed):1718 blocked for more than 122 seconds. [ 249.074266] Tainted: G W L -------- --- 5.19.0-0.rc0.20220526gitbabf0bb978e3.4.fc37.x86_64 #1 [ 249.074285] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 249.074289] task:(ostnamed) state:D stack:14552 pid: 1718 ppid: 1 flags:0x00000006 [ 249.074299] Call Trace: [ 249.074302] <TASK> [ 249.074310] __schedule+0x492/0x1640 [ 249.074316] ? lock_is_held_type+0xe8/0x140 [ 249.074324] ? find_held_lock+0x32/0x80 [ 249.074339] schedule+0x4e/0xb0 [ 249.074346] schedule_preempt_disabled+0x14/0x20 [ 249.074352] __mutex_lock+0x423/0x890 [ 249.074361] ? amdgpu_ctx_mgr_entity_flush+0x32/0xc0 [amdgpu] [ 249.074564] ? amdgpu_ctx_mgr_entity_flush+0x32/0xc0 [amdgpu] [ 249.074754] amdgpu_ctx_mgr_entity_flush+0x32/0xc0 [amdgpu] [ 249.074950] amdgpu_flush+0x25/0x40 [amdgpu] [ 249.075133] filp_close+0x31/0x70 [ 249.075140] __close_range+0x130/0x320 [ 249.075150] __x64_sys_close_range+0x13/0x20 [ 249.075154] do_syscall_64+0x5b/0x80 [ 249.075164] ? lock_is_held_type+0xe8/0x140 [ 249.075175] ? do_syscall_64+0x67/0x80 [ 249.075180] ? lockdep_hardirqs_on+0x7d/0x100 [ 249.075187] ? do_syscall_64+0x67/0x80 [ 249.075194] ? lock_is_held_type+0xe8/0x140 [ 249.075204] ? asm_exc_page_fault+0x27/0x30 [ 249.075210] ? lockdep_hardirqs_on+0x7d/0x100 [ 249.075217] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 249.075222] RIP: 0033:0x7fd71f54f97b [ 249.075231] RSP: 002b:00007fffc8e752a8 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4 [ 249.075237] RAX: ffffffffffffffda RBX: 00007fffc8e752b0 RCX: 00007fd71f54f97b [ 249.075241] RDX: 0000000000000000 RSI: 00000000000000b9 RDI: 0000000000000027 [ 249.075245] RBP: 00007fffc8e75330 R08: 0000000000000000 R09: 00007fffc8e75380 [ 249.075249] R10: 00007fffc8e751f0 R11: 0000000000000246 R12: 0000000000000004 [ 249.075253] R13: 00007fffc8e75340 R14: 0000000000000000 R15: 0000000000000003 [ 249.075289] </TASK> [ 249.075294] INFO: task (pcscd):1749 blocked for more than 122 seconds. [ 249.075298] Tainted: G W L -------- --- 5.19.0-0.rc0.20220526gitbabf0bb978e3.4.fc37.x86_64 #1 [ 249.075302] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 249.075306] task:(pcscd) state:D stack:14256 pid: 1749 ppid: 1 flags:0x00000002 [ 249.075314] Call Trace: [ 249.075318] <TASK> [ 249.075325] __schedule+0x492/0x1640 [ 249.075331] ? lock_is_held_type+0xe8/0x140 [ 249.075339] ? find_held_lock+0x32/0x80 [ 249.075353] schedule+0x4e/0xb0 [ 249.075360] schedule_preempt_disabled+0x14/0x20 [ 249.075365] __mutex_lock+0x423/0x890 [ 249.075375] ? amdgpu_ctx_mgr_entity_flush+0x32/0xc0 [amdgpu] [ 249.075574] ? amdgpu_ctx_mgr_entity_flush+0x32/0xc0 [amdgpu] [ 249.075764] amdgpu_ctx_mgr_entity_flush+0x32/0xc0 [amdgpu] [ 249.075960] amdgpu_flush+0x25/0x40 [amdgpu] [ 249.076143] filp_close+0x31/0x70 [ 249.076150] __close_range+0x130/0x320 [ 249.076160] __x64_sys_close_range+0x13/0x20 [ 249.076164] do_syscall_64+0x5b/0x80 [ 249.076169] ? do_syscall_64+0x67/0x80 [ 249.076175] ? lockdep_hardirqs_on+0x7d/0x100 [ 249.076182] ? do_syscall_64+0x67/0x80 [ 249.076188] ? do_syscall_64+0x67/0x80 [ 249.076194] ? lockdep_hardirqs_on+0x7d/0x100 [ 249.076201] ? do_syscall_64+0x67/0x80 [ 249.076206] ? do_syscall_64+0x67/0x80 [ 249.076211] ? lockdep_hardirqs_on+0x7d/0x100 [ 249.076218] ? do_syscall_64+0x67/0x80 [ 249.076223] ? lock_is_held_type+0xe8/0x140 [ 249.076233] ? asm_exc_page_fault+0x27/0x30 [ 249.076239] ? lockdep_hardirqs_on+0x7d/0x100 [ 249.076246] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 249.076251] RIP: 0033:0x7fd71f54f97b [ 249.076259] RSP: 002b:00007fffc8e752a8 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4 [ 249.076265] RAX: ffffffffffffffda RBX: 00007fffc8e752b0 RCX: 00007fd71f54f97b [ 249.076287] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 000000000000004c [ 249.076291] RBP: 00007fffc8e75330 R08: 0000000000000000 R09: 00007fffc8e75380 [ 249.076295] R10: 00007fffc8e751f0 R11: 0000000000000246 R12: 0000000000000003 [ 249.076300] R13: 00007fffc8e75340 R14: 0000000000000000 R15: 0000000000000003 [ 249.076319] </TASK> [ 249.076323] Showing all locks held in the system: [ 249.076335] 1 lock held by khungtaskd/183: [ 249.076340] #0: ffffffff84169060 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x15/0x16b [ 249.076364] 3 locks held by systemd-journal/868: [ 249.076376] 3 locks held by gnome-shell/1626: [ 249.076380] #0: ffff9f2b248e4680 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x3c/0x880 [ 249.076394] #1: ffff9f2b248e4728 (&sig->exec_update_lock){++++}-{3:3}, at: begin_new_exec+0x384/0xcc0 [ 249.076407] #2: ffff9f2b3a95ec58 (&mgr->lock#3){+.+.}-{3:3}, at: amdgpu_ctx_mgr_entity_flush+0x32/0xc0 [amdgpu] [ 249.076609] 1 lock held by (brt-dbus)/1645: [ 249.076613] #0: ffff9f2b3a95ec58 (&mgr->lock#3){+.+.}-{3:3}, at: amdgpu_ctx_mgr_entity_flush+0x32/0xc0 [amdgpu] [ 249.076814] 1 lock held by (ostnamed)/1718: [ 249.076818] #0: ffff9f2b3a95ec58 (&mgr->lock#3){+.+.}-{3:3}, at: amdgpu_ctx_mgr_entity_flush+0x32/0xc0 [amdgpu] [ 249.077018] 1 lock held by (pcscd)/1749: [ 249.077022] #0: ffff9f2b3a95ec58 (&mgr->lock#3){+.+.}-{3:3}, at: amdgpu_ctx_mgr_entity_flush+0x32/0xc0 [amdgpu] [ 249.077226] ============================================= [ 335.093113] kworker/dying (297) used greatest stack depth: 11608 bytes left [ 335.093254] kworker/dying (241) used greatest stack depth: 11360 bytes left Full kernel log is here: https://pastebin.com/0YHs6wyB Naturally, I tried to find the problematic commit via git bisect. It was the longest bisect in my life, I needed to collect the core 565 times and it took three weeks. This is what explains why I am writing only now, and not immediately. The most annoying thing is that it looks like I wasted three weeks because the exact commit was never found. My bisect log can be found here: https://pastebin.com/AhLMNfyv If you open it you will see a lot of skip steps. This is due to the fact that in these steps I observe a problem when loading the kernel hangs on the messages on screen: [drm] amdgpu kernel modesetting enabled. amdgpu: Ignoring ACPI CRAT on non-APU system amdgpu: Virtual CRAT table created for CPU amdgpu: Topology: Add CPU node Here is photo of boot screen: https://i.postimg.cc/DwVbYP4b/IMG-20220525-130140.jpg And the following trace is written to the log: [ 8.173558] [drm] amdgpu kernel modesetting enabled. [ 8.196766] amdgpu: Ignoring ACPI CRAT on non-APU system [ 8.196846] amdgpu: Virtual CRAT table created for CPU [ 8.197015] amdgpu: Topology: Add CPU node [ 8.201791] Console: switching to colour dummy device 80x25 [ 8.215200] page:00000000b17305fd refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1029c00 [ 8.215224] head:00000000b17305fd order:0 compound_mapcount:-6459 compound_pincount:0 [ 8.215243] flags: 0x17ffffc0010000(head|node=0|zone=2|lastcpupid=0x1fffff) [ 8.215261] raw: 0017ffffc0010000 ffffe6c480a70008 ffffe6c480a70008 0000000000000000 [ 8.215279] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 8.215296] page dumped because: VM_BUG_ON_PAGE(compound && compound_order(page) != order) [ 8.215324] ------------[ cut here ]------------ [ 8.215340] kernel BUG at mm/page_alloc.c:1329! [ 8.215358] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 8.215363] CPU: 20 PID: 584 Comm: systemd-udevd Tainted: G W 5.18.0-rc1-004-c6ed9f66eb70aeaac9998bd3552ada740d90e20c+ #357 [ 8.215370] Hardware name: System manufacturer System Product Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022 [ 8.215375] RIP: 0010:free_pcp_prepare+0x455/0x650 [ 8.215381] Code: ff ff 48 8b 43 48 a8 01 0f 84 48 ff ff ff 48 83 e8 01 48 39 c3 0f 84 3b ff ff ff 48 c7 c6 08 f0 85 aa 48 89 df e8 5b cb fc ff <0f> 0b 4c 89 ef 48 89 14 24 41 83 c6 01 e8 b9 ed ff ff 48 8b 14 24 [ 8.215390] RSP: 0018:ffffbb7dc23779d8 EFLAGS: 00010296 [ 8.215394] RAX: 000000000000004e RBX: ffffe6c480a70000 RCX: 0000000000000000 [ 8.215399] RDX: 0000000000000001 RSI: ffffffffaa89db77 RDI: 00000000ffffffff [ 8.215402] RBP: 0000000000000009 R08: 0000000000000000 R09: ffffbb7dc23777c0 [ 8.215406] R10: 0000000000000003 R11: ffffa08bae1fefe8 R12: 0000000000000000 [ 8.215410] R13: ffffa07c817eadc0 R14: 00000000fffffe00 R15: ffffe6c480a70000 [ 8.215414] FS: 00007f35b2f1ab40(0000) GS:ffffa08b5d200000(0000) knlGS:0000000000000000 [ 8.215419] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.215422] CR2: 00005631caec1878 CR3: 000000017d09c000 CR4: 0000000000350ee0 [ 8.215427] Call Trace: [ 8.215429] <TASK> [ 8.215431] ? find_held_lock+0x32/0x80 [ 8.215436] free_unref_page+0x25/0x280 [ 8.215440] __vunmap+0x261/0x3d0 [ 8.215444] drm_fbdev_cleanup+0x6b/0xc0 [ 8.215449] drm_fbdev_fb_destroy+0x15/0x30 [ 8.215453] unregister_framebuffer+0x2e/0x40 [ 8.215458] drm_client_dev_unregister+0x6e/0xe0 [ 8.215464] drm_dev_unregister+0x34/0x90 [ 8.215467] drm_dev_unplug+0x24/0x40 [ 8.215471] simpledrm_remove+0x11/0x20 [ 8.215475] platform_remove+0x1f/0x40 [ 8.215479] device_release_driver_internal+0x1b8/0x220 [ 8.215484] bus_remove_device+0xef/0x160 [ 8.215488] device_del+0x18c/0x3f0 [ 8.215492] platform_device_del.part.0+0x13/0x70 [ 8.215496] platform_device_unregister+0x1c/0x30 [ 8.215500] drm_aperture_detach_drivers+0xa3/0xd0 [ 8.215505] drm_aperture_remove_conflicting_pci_framebuffers+0x3f/0x70 [ 8.215511] amdgpu_pci_probe+0x126/0x3c0 [amdgpu] [ 8.215672] local_pci_probe+0x41/0x80 [ 8.215677] pci_device_probe+0xaa/0x200 [ 8.215681] really_probe+0x1a0/0x370 [ 8.215685] __driver_probe_device+0xfb/0x170 [ 8.215689] driver_probe_device+0x1f/0x90 [ 8.215693] __driver_attach+0xbe/0x1a0 [ 8.215697] ? __device_attach_driver+0xe0/0xe0 [ 8.215701] bus_for_each_dev+0x65/0x90 [ 8.215705] bus_add_driver+0x150/0x1f0 [ 8.215709] driver_register+0x89/0xd0 [ 8.215713] ? 0xffffffffc044e000 [ 8.215719] do_one_initcall+0x69/0x350 [ 8.215724] ? do_init_module+0x22/0x260 [ 8.215728] ? rcu_read_lock_sched_held+0x3b/0x70 [ 8.215732] ? trace_kmalloc+0x3b/0x100 [ 8.215737] ? kmem_cache_alloc_trace+0x1eb/0x3a0 [ 8.215742] do_init_module+0x4a/0x260 [ 8.215745] __do_sys_finit_module+0x93/0xf0 [ 8.215751] do_syscall_64+0x3a/0x80 [ 8.215756] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 8.215761] RIP: 0033:0x7f35b3acb62d [ 8.215765] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 c7 0c 00 f7 d8 64 89 01 48 [ 8.215773] RSP: 002b:00007ffc39f6ef68 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 8.215778] RAX: ffffffffffffffda RBX: 00005631cae55830 RCX: 00007f35b3acb62d [ 8.215782] RDX: 0000000000000000 RSI: 00005631cae6ceb0 RDI: 0000000000000011 [ 8.215786] RBP: 00005631cae6ceb0 R08: 0000000000000000 R09: 00007f35b3b98c80 [ 8.215790] R10: 0000000000000011 R11: 0000000000000246 R12: 0000000000020000 [ 8.215794] R13: 00005631cae74660 R14: 0000000000000000 R15: 00005631cae805d0 [ 8.215800] </TASK> [ 8.215801] Modules linked in: amdgpu(+) drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul iommu_v2 crc32c_intel gpu_sched ucsi_ccg nvme drm_buddy typec_ucsi ghash_clmulni_intel igb ccp drm_dp_helper typec sp5100_tco nvme_core dca wmi ip6_tables ip_tables ipmi_devintf ipmi_msghandler fuse [ 8.215825] ---[ end trace 0000000000000000 ]--- [ 8.215828] RIP: 0010:free_pcp_prepare+0x455/0x650 [ 8.215832] Code: ff ff 48 8b 43 48 a8 01 0f 84 48 ff ff ff 48 83 e8 01 48 39 c3 0f 84 3b ff ff ff 48 c7 c6 08 f0 85 aa 48 89 df e8 5b cb fc ff <0f> 0b 4c 89 ef 48 89 14 24 41 83 c6 01 e8 b9 ed ff ff 48 8b 14 24 [ 8.215841] RSP: 0018:ffffbb7dc23779d8 EFLAGS: 00010296 [ 8.215844] RAX: 000000000000004e RBX: ffffe6c480a70000 RCX: 0000000000000000 [ 8.215848] RDX: 0000000000000001 RSI: ffffffffaa89db77 RDI: 00000000ffffffff [ 8.215852] RBP: 0000000000000009 R08: 0000000000000000 R09: ffffbb7dc23777c0 [ 8.215856] R10: 0000000000000003 R11: ffffa08bae1fefe8 R12: 0000000000000000 [ 8.215860] R13: ffffa07c817eadc0 R14: 00000000fffffe00 R15: ffffe6c480a70000 [ 8.215864] FS: 00007f35b2f1ab40(0000) GS:ffffa08b5d200000(0000) knlGS:0000000000000000 [ 8.215875] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.215879] CR2: 00005631caec1878 CR3: 000000017d09c000 CR4: 0000000000350ee0 [ 8.216344] systemd-udevd (584) used greatest stack depth: 12776 bytes left Full kernel log is here: https://pastebin.com/rDAjKpSg Please help me get rid of the bug that crashes systemd-udevd so I can find the exact commit that caused the GPU hang. Or, based on the trace of the hung process, help fix the problem. Thank you all in advance. UPD: I am still observing the issue rc1-rc4 :( My hardware specs: GPU: 6900XT CPU: 3950X M/B: ROG Strix X570-I Gaming RAM: 64GB SSD: Intel Optane 905P -- Best Regards, Mike Gavrilov.