[Bug 201957] amdgpu: ring gfx timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=201957

Csaba Tímár (csaba.timar01@xxxxxxxxx) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |csaba.timar01@xxxxxxxxx

--- Comment #47 from Csaba Tímár (csaba.timar01@xxxxxxxxx) ---
I have something very similar with my Vega56. I can reproduce it with Win10
too. 
I think it's an AMD Hw issue. 

march 28 15:07:35 PC-home kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]]
*ERROR* Waiting for fences timed out!
march 28 15:07:35 PC-home kernel: qcm fence wait loop timeout expired
march 28 15:07:35 PC-home kernel: The cp might be in an unrecoverable state due
to an unsuccessful queues preemption
march 28 15:07:35 PC-home kernel: amdgpu: Failed to evict process queues
march 28 15:07:35 PC-home kernel: amdgpu 0000:0a:00.0: amdgpu: GPU reset begin!
march 28 15:07:35 PC-home kernel: amdgpu: Failed to quiesce KFD
march 28 15:07:35 PC-home kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
ring gfx timeout, signaled seq=567492, emitted seq=567494
march 28 15:07:35 PC-home kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process vkcube pid 7677 thread vkcube pid 7677
march 28 15:07:35 PC-home kernel: amdgpu 0000:0a:00.0: amdgpu: GPU reset begin!
march 28 15:07:35 PC-home kernel: amdgpu 0000:0a:00.0: amdgpu: Bailing on TDR
for s_job:869c2, as another already in progress
march 28 15:07:36 PC-home kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
ring page1 timeout, signaled seq=20352, emitted seq=20353
march 28 15:07:36 PC-home kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process  pid 0 thread  pid 0
march 28 15:07:36 PC-home kernel: amdgpu 0000:0a:00.0: amdgpu: GPU reset begin!
march 28 15:07:36 PC-home kernel: amdgpu 0000:0a:00.0: amdgpu: Bailing on TDR
for s_job:4f80, as another already in progress
march 28 15:07:39 PC-home kernel: amdgpu 0000:0a:00.0: amdgpu: failed to
suspend display audio
march 28 15:07:39 PC-home kernel: BUG: unable to handle page fault for address:
ffffa9c54bb4f910
march 28 15:07:39 PC-home kernel: #PF: supervisor write access in kernel mode
march 28 15:07:39 PC-home kernel: #PF: error_code(0x0002) - not-present page
march 28 15:07:39 PC-home kernel: PGD 100000067 P4D 100000067 PUD 1001b9067 PMD
1cdabb067 PTE 0
march 28 15:07:39 PC-home kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
march 28 15:07:39 PC-home kernel: CPU: 9 PID: 8586 Comm: kworker/9:0 Tainted: G
          OE     5.11.6-1-MANJARO #1


march 28 15:07:39 PC-home kernel: Hardware name: System manufacturer System
Product Name/PRIME A320M-K, BIOS 5603 10/14/2020
march 28 15:07:39 PC-home kernel: Workqueue: events kfd_process_hw_exception
[amdgpu]
march 28 15:07:39 PC-home kernel: RIP: 0010:amdgpu_device_lock_adev+0x2b/0x83
[amdgpu]
march 28 15:07:39 PC-home kernel: Code: 1f 44 00 00 31 c0 ba 01 00 00 00 f0 0f
b1 97 f4 77 01 00 45 31 c0 85 c0 75 64 53 48 89 fb 48 8d bf 00 78 01 00 e8 e7
16 27 c9 <f0> ff 83 40 >
march 28 15:07:39 PC-home kernel: RSP: 0018:ffffa9c54c73be00 EFLAGS: 00010246
march 28 15:07:39 PC-home kernel: RAX: ffff951f0c155dc0 RBX: ffffa9c54bb495d0
RCX: 0000000000000001
march 28 15:07:39 PC-home kernel: RDX: 0000000000000001 RSI: 0000000000000000
RDI: ffffa9c54bb60dd0
march 28 15:07:39 PC-home kernel: RBP: 0000000000000000 R08: 0000000000000000
R09: 0000000000000000
march 28 15:07:39 PC-home kernel: R10: 0000000000000003 R11: 0000000000000000
R12: ffffa9c54bb495d0
march 28 15:07:39 PC-home kernel: R13: ffff951e19160000 R14: ffff951e19170e30
R15: 00000000000000e0
march 28 15:07:39 PC-home kernel: FS:  0000000000000000(0000)
GS:ffff95210ea40000(0000) knlGS:0000000000000000
march 28 15:07:39 PC-home kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
march 28 15:07:39 PC-home kernel: CR2: ffffa9c54bb4f910 CR3: 0000000385410000
CR4: 00000000003506e0
march 28 15:07:39 PC-home kernel: Call Trace:
march 28 15:07:39 PC-home kernel:  amdgpu_device_gpu_recover.cold+0x180/0x95d
[amdgpu]
march 28 15:07:39 PC-home kernel:  ?
amdgpu_device_doorbell_init.part.0+0x71/0xc0 [amdgpu]
march 28 15:07:39 PC-home kernel:  process_one_work+0x214/0x3e0
march 28 15:07:39 PC-home kernel:  worker_thread+0x4d/0x3d0
march 28 15:07:39 PC-home kernel:  ? rescuer_thread+0x3c0/0x3c0
march 28 15:07:39 PC-home kernel:  kthread+0x142/0x160
march 28 15:07:39 PC-home kernel:  ? __kthread_bind_mask+0x60/0x60
march 28 15:07:39 PC-home kernel:  ret_from_fork+0x22/0x30
march 28 15:07:39 PC-home kernel: Modules linked in: rfcomm cmac algif_hash
algif_skcipher af_alg bnep btusb btrtl btbcm btintel bluetooth ecdh_generic ecc
uas usb_storage mousedev>
march 28 15:07:39 PC-home kernel:  gpio_amdpt acpi_cpufreq drm uinput sg fuse
crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2
crc32c_intel xhci_pci
march 28 15:07:39 PC-home kernel: CR2: ffffa9c54bb4f910
march 28 15:07:39 PC-home kernel: ---[ end trace 2eaf88bedaabd891 ]---
march 28 15:07:39 PC-home kernel: RIP: 0010:amdgpu_device_lock_adev+0x2b/0x83
[amdgpu]
march 28 15:07:39 PC-home kernel: Code: 1f 44 00 00 31 c0 ba 01 00 00 00 f0 0f
b1 97 f4 77 01 00 45 31 c0 85 c0 75 64 53 48 89 fb 48 8d bf 00 78 01 00 e8 e7
16 27 c9 <f0> ff 83 40 >
march 28 15:07:39 PC-home kernel: RSP: 0018:ffffa9c54c73be00 EFLAGS: 00010246
march 28 15:07:39 PC-home kernel: RAX: ffff951f0c155dc0 RBX: ffffa9c54bb495d0
RCX: 0000000000000001
march 28 15:07:39 PC-home kernel: RDX: 0000000000000001 RSI: 0000000000000000
RDI: ffffa9c54bb60dd0
march 28 15:07:39 PC-home kernel: RBP: 0000000000000000 R08: 0000000000000000
R09: 0000000000000000
march 28 15:07:39 PC-home kernel: R10: 0000000000000003 R11: 0000000000000000
R12: ffffa9c54bb495d0
march 28 15:07:39 PC-home kernel: R13: ffff951e19160000 R14: ffff951e19170e30
R15: 00000000000000e0
march 28 15:07:39 PC-home kernel: FS:  0000000000000000(0000)
GS:ffff95210ea40000(0000) knlGS:0000000000000000
march 28 15:07:39 PC-home kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
march 28 15:07:39 PC-home kernel: CR2: ffffa9c54bb4f910 CR3: 00000002fa6de000
CR4: 00000000003506e0

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux