https://bugzilla.kernel.org/show_bug.cgi?id=217514 Bug ID: 217514 Summary: [amdgpu] system doesn't boot after linux-firmware 2023-05-23 ffe1a41e Product: Drivers Version: 2.5 Hardware: All OS: Linux Status: NEW Severity: normal Priority: P3 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@xxxxxxxxxxxxxxxxxxxx Reporter: rly@xxxxxxxxxx Regression: No Created attachment 304361 --> https://bugzilla.kernel.org/attachment.cgi?id=304361&action=edit softlockup Updating linux-firmware to the latest git version causes my pc to lock up during boot. I have a 3900x paired with a 7900xtx running arch linux with 6.3.4 xanmod kernel (but this happens with kernel from the core repo as well) and mesa 23.1.1 if that matters. During boot time I see the following error printed and the system is completely locked up, only hard reset helps: `May 31 07:20:40 valhalla kernel: watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [swapper/5:0]` accompanied with a lots of amdgpu errors in the journal (followed by stack trace after both): ``` May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:9 pasid:32768, for process pid 0 thread pid 0) May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: in page starting at address 0x0000ffff0021a000 from client 10 May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00900831 May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: Faulty UTCL2 client ID: CPF (0x4) May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: MORE_FAULTS: 0x1 May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: WALKER_ERROR: 0x0 May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: PERMISSION_FAULTS: 0x3 May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: MAPPING_ERROR: 0x0 May 31 07:20:44 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: RW: 0x0 ``` full journal log in "softlockup". The issues start to happen after [this commit, ffe1a41e](https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=ffe1a41e2ddbc39109b12d95dcac282d90eba8fc) but not the above mentioned soft lock, instead after initramfs loads I get the bios splash screen back and it's stuck there. There are different amdgpu errors(followed by stack trace) during this: ``` May 31 09:18:37 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000006 SMN_C2PMSG_82:0x00000000 May 31 09:18:37 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to enable requested dpm features! May 31 09:18:37 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to setup smc hw! May 31 09:18:37 valhalla kernel: [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <smu> failed -62 May 31 09:18:37 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: amdgpu_device_ip_init failed May 31 09:18:37 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: Fatal error during GPU init May 31 09:18:37 valhalla kernel: amdgpu 0000:0c:00.0: amdgpu: amdgpu: finishing device. ``` Logs during this in "amdgpu_error" Note that at the end it seems like the system is running but as I only saw the bios splash screen rebooted via sysrq/reisub. The commit after ffe1a41 ([56832557](https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=568325574a3b6148f3296984aa24fcd1fb4b912c) or might be the one after that [39dafcc](https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=39d6fcc73100ae4aeeec0194bbf102c672673edd), not sure at the moment) gets past the splash screen but that's where the soft lockup starts to happen. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.