Comment # 6
on bug 110509
from James.Dutton@gmail.com
I think I have found the problem. [ 657.526313] amdgpu 0000:43:00.0: GPU reset begin! [ 657.526318] Evicting PASID 32782 queues [ 667.756000] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:49:crtc-0] hw_done or flip_done timed out The intention is to do a GPU reset, but the implementation in the code is just to try and do a suspend. Part of the suspend does this: Apr 29 14:29:19 thread kernel: [ 363.445607] INFO: task kworker/u258:0:55 blocked for more than 120 seconds. Apr 29 14:29:19 thread kernel: [ 363.445612] Not tainted 5.0.10-dirty #26 Apr 29 14:29:19 thread kernel: [ 363.445613] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 29 14:29:19 thread kernel: [ 363.445615] kworker/u258:0 D 0 55 2 0x80000000 Apr 29 14:29:19 thread kernel: [ 363.445628] Workqueue: events_unbound commit_work [drm_kms_helper] Apr 29 14:29:19 thread kernel: [ 363.445629] Call Trace: Apr 29 14:29:19 thread kernel: [ 363.445635] __schedule+0x2c0/0x880 Apr 29 14:29:19 thread kernel: [ 363.445637] schedule+0x2c/0x70 Apr 29 14:29:19 thread kernel: [ 363.445639] schedule_timeout+0x1db/0x360 Apr 29 14:29:19 thread kernel: [ 363.445641] ? update_load_avg+0x8b/0x590 Apr 29 14:29:19 thread kernel: [ 363.445645] dma_fence_default_wait+0x1eb/0x270 Apr 29 14:29:19 thread kernel: [ 363.445647] ? dma_fence_release+0xa0/0xa0 Apr 29 14:29:19 thread kernel: [ 363.445649] dma_fence_wait_timeout+0xfd/0x110 Apr 29 14:29:19 thread kernel: [ 363.445651] reservation_object_wait_timeout_rcu+0x17d/0x370 Apr 29 14:29:19 thread kernel: [ 363.445710] amdgpu_dm_do_flip+0x14a/0x4a0 [amdgpu] Apr 29 14:29:19 thread kernel: [ 363.445767] amdgpu_dm_atomic_commit_tail+0x7b7/0xc10 [amdgpu] Apr 29 14:29:19 thread kernel: [ 363.445820] ? amdgpu_dm_atomic_commit_tail+0x7b7/0xc10 [amdgpu] Apr 29 14:29:19 thread kernel: [ 363.445828] commit_tail+0x42/0x70 [drm_kms_helper] Apr 29 14:29:19 thread kernel: [ 363.445835] commit_work+0x12/0x20 [drm_kms_helper] Apr 29 14:29:19 thread kernel: [ 363.445838] process_one_work+0x1fd/0x400 Apr 29 14:29:19 thread kernel: [ 363.445840] worker_thread+0x34/0x410 Apr 29 14:29:19 thread kernel: [ 363.445841] kthread+0x121/0x140 Apr 29 14:29:19 thread kernel: [ 363.445843] ? process_one_work+0x400/0x400 Apr 29 14:29:19 thread kernel: [ 363.445844] ? kthread_park+0x90/0x90 Apr 29 14:29:19 thread kernel: [ 363.445847] ret_from_fork+0x22/0x40 So, amggpu_dm_do_flip() is the bit that hangs. If the GPU needs to be reset because some of it has hung, trying a "flip" is unlikely to work. It is failing/hanging when doing "suspend of IP block <dm>" in amdgpu_device_ip_suspend_phase1(). I would suggest creating code that actually tries to reset the GPU, instead of trying to suspend it while GPU is hung.
You are receiving this mail because:
- You are the assignee for the bug.
_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel