The stack trace is expected part of reset procedure so that ok. The
issue you are having is a hang on one of GPU jobs during resume which
triggers a GPU reset attempt.
You can open a ticket with this issue here
https://gitlab.freedesktop.org/drm/amd/-/issues, please attach full
dmesg log.
Andrey
On 2022-07-26 05:06, Tom Cook wrote:
I have a Ryzen 7 3700U in an HP laptop. lspci describes the GPU in this way:
04:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile
Series] (rev c1)
This laptop has never successfully resumed from suspend (I have tried
every 5.x kernel). Currently on 5.18.0, the system appears to be okay
after resume apart from the gpu which is usually giving a blank
screen, occasionally a scrambled output. After rebooting, I see this
in syslog:
Jul 25 11:02:18 frog kernel: [240782.968674] amdgpu 0000:04:00.0:
amdgpu: GPU reset begin!
Jul 25 11:02:19 frog kernel: [240783.974891] amdgpu 0000:04:00.0:
[drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test
failed (-110)
Jul 25 11:02:19 frog kernel: [240783.988650] [drm] free PSP TMR buffer
Jul 25 11:02:19 frog kernel: [240784.019057] CPU: 4 PID: 305612 Comm:
kworker/u32:17 Not tainted 5.18.0 #1
Jul 25 11:02:19 frog kernel: [240784.019063] Hardware name: HP HP ENVY
x360 Convertible 15-ds0xxx/85DD, BIOS F.20 05/28/2020
Jul 25 11:02:19 frog kernel: [240784.019067] Workqueue:
amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Jul 25 11:02:19 frog kernel: [240784.019079] Call Trace:
Jul 25 11:02:19 frog kernel: [240784.019082] <TASK>
Jul 25 11:02:19 frog kernel: [240784.019085] dump_stack_lvl+0x49/0x5f
Jul 25 11:02:19 frog kernel: [240784.019095] dump_stack+0x10/0x12
Jul 25 11:02:19 frog kernel: [240784.019099]
amdgpu_do_asic_reset+0x2f/0x4e0 [amdgpu]
Jul 25 11:02:19 frog kernel: [240784.019278]
amdgpu_device_gpu_recover_imp+0x41e/0xb50 [amdgpu]
Jul 25 11:02:19 frog kernel: [240784.019452]
amdgpu_job_timedout+0x155/0x1b0 [amdgpu]
Jul 25 11:02:19 frog kernel: [240784.019674]
drm_sched_job_timedout+0x74/0xf0 [gpu_sched]
Jul 25 11:02:19 frog kernel: [240784.019681] ?
amdgpu_cgs_destroy_device+0x10/0x10 [amdgpu]
Jul 25 11:02:19 frog kernel: [240784.019896] ?
drm_sched_job_timedout+0x74/0xf0 [gpu_sched]
Jul 25 11:02:19 frog kernel: [240784.019903] process_one_work+0x227/0x440
Jul 25 11:02:19 frog kernel: [240784.019908] worker_thread+0x31/0x3d0
Jul 25 11:02:19 frog kernel: [240784.019912] ? process_one_work+0x440/0x440
Jul 25 11:02:19 frog kernel: [240784.019914] kthread+0xfe/0x130
Jul 25 11:02:19 frog kernel: [240784.019918] ?
kthread_complete_and_exit+0x20/0x20
Jul 25 11:02:19 frog kernel: [240784.019923] ret_from_fork+0x22/0x30
Jul 25 11:02:19 frog kernel: [240784.019930] </TASK>
Jul 25 11:02:19 frog kernel: [240784.019934] amdgpu 0000:04:00.0:
amdgpu: MODE2 reset
Jul 25 11:02:19 frog kernel: [240784.020178] amdgpu 0000:04:00.0:
amdgpu: GPU reset succeeded, trying to resume
Jul 25 11:02:19 frog kernel: [240784.020552] [drm] PCIE GART of 1024M enabled.
Jul 25 11:02:19 frog kernel: [240784.020555] [drm] PTB located at
0x000000F400900000
Jul 25 11:02:19 frog kernel: [240784.020577] [drm] VRAM is lost due to
GPU reset!
Jul 25 11:02:19 frog kernel: [240784.020579] [drm] PSP is resuming...
Jul 25 11:02:19 frog kernel: [240784.040465] [drm] reserve 0x400000
from 0xf47fc00000 for PSP TMR
I'm running the latest BIOS from HP. Is there anything I can do to
work around this? Or anything I can do to help debug it?
Regards,
Tom Cook