[Bug 213145] AMDGPU resets, timesout and crashes after "ERROR Waiting for fences timed out!"

bugzilla-daemon@xxxxxxxxxx · Sat, 12 Nov 2022 16:24:21 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=213145

fmhirtz@xxxxxxxxxx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fmhirtz@xxxxxxxxxx

--- Comment #28 from fmhirtz@xxxxxxxxxx ---
I'm seeing what appears to be this on Fedora 37 with an AMD 5700xt. Normal
desktop use in Wayland/Gnome will sporadically freeze and crash every couple of
days. It normally will reset back to the login given some time:

Kernel: 6.0.7-301.fc37.x86_64
Mesa: mesa-*23.0.0-0.3.git74bbeb5.fc37

~~~
Nov 08 02:01:33 workstation kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]]
*ERROR* Waiting for fences timed out!
Nov 08 02:01:33 workstation kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
ring gfx_0.0.0 timeout, signaled seq=14613616, emitted seq=14613618
Nov 08 02:01:33 workstation kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process firefox pid 21845 thread firefox:cs0 pid 21922
Nov 08 02:01:33 workstation kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset
begin!
Nov 08 02:01:34 workstation kernel: amdgpu 0000:0c:00.0:
[drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed
(-110)
Nov 08 02:01:34 workstation kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR*
KGQ disable failed
Nov 08 02:01:34 workstation kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR*
failed to halt cp gfx
Nov 08 02:01:34 workstation kernel: [drm] free PSP TMR buffer
Nov 08 02:01:34 workstation kernel: CPU: 19 PID: 871009 Comm: kworker/u64:3 Not
tainted 5.19.16-301.fc37.x86_64 #1
Nov 08 02:01:34 workstation kernel: Hardware name: MicroElectronics G464/TUF
GAMING X570-PLUS (WI-FI), BIOS 3001 12/04/2020
Nov 08 02:01:34 workstation kernel: Workqueue: amdgpu-reset-dev
drm_sched_job_timedout [gpu_sched]
Nov 08 02:01:34 workstation kernel: Call Trace:
Nov 08 02:01:34 workstation kernel:  <TASK>
Nov 08 02:01:34 workstation kernel:  dump_stack_lvl+0x44/0x5c
Nov 08 02:01:34 workstation kernel:  amdgpu_do_asic_reset+0x26/0x459 [amdgpu]
Nov 08 02:01:34 workstation kernel: 
amdgpu_device_gpu_recover_imp.cold+0x59d/0x8cb [amdgpu]
Nov 08 02:01:34 workstation kernel:  amdgpu_job_timedout+0x156/0x190 [amdgpu]
Nov 08 02:01:34 workstation kernel:  ? __switch_to+0x106/0x430
Nov 08 02:01:34 workstation kernel:  drm_sched_job_timedout+0x76/0x110
[gpu_sched]
Nov 08 02:01:34 workstation kernel:  process_one_work+0x1c7/0x380
Nov 08 02:01:34 workstation kernel:  worker_thread+0x4d/0x380
Nov 08 02:01:34 workstation kernel:  ? _raw_spin_lock_irqsave+0x23/0x50
Nov 08 02:01:34 workstation kernel:  ? process_one_work+0x380/0x380
Nov 08 02:01:34 workstation kernel:  kthread+0xe9/0x110
Nov 08 02:01:34 workstation kernel:  ? kthread_complete_and_exit+0x20/0x20
Nov 08 02:01:34 workstation kernel:  ret_from_fork+0x22/0x30
Nov 08 02:01:34 workstation kernel:  </TASK>
Nov 08 02:01:34 workstation kernel: amdgpu 0000:0c:00.0: amdgpu: BACO reset
Nov 08 02:01:37 workstation kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset
succeeded, trying to resume
Nov 08 02:01:37 workstation kernel: [drm] PCIE GART of 512M enabled (table at
0x0000008000300000).
Nov 08 02:01:37 workstation kernel: [drm] VRAM is lost due to GPU reset!
Nov 08 02:01:37 workstation kernel: [drm] PSP is resuming...
Nov 08 02:01:37 workstation kernel: [drm] reserve 0x900000 from 0x81fe600000
for PSP TMR
...
~~~

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213145] AMDGPU resets, timesout and crashes after "*ERROR* Waiting for fences timed out!"

[Bug 213145] AMDGPU resets, timesout and crashes after "ERROR Waiting for fences timed out!"