https://bugzilla.kernel.org/show_bug.cgi?id=206389 Bug ID: 206389 Summary: ambgpu crashes randomly Product: Drivers Version: 2.5 Kernel Version: 5.4.16 Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@xxxxxxxxxxxxxxxxxxxx Reporter: rob@xxxxxxxxxxxxxx Regression: No Created attachment 287075 --> https://bugzilla.kernel.org/attachment.cgi?id=287075&action=edit big crash log file The driver crashes randomly at least once every few hours. Sometimes when left idle, sometimes when just using chrome. I haven't found a way to reliably reproduce the issue. After the crash the screen is full of artefacts and the only way forward is to restart the PC. Backtrace #1: Feb 02 10:29:49 trudex kernel: [drm:gfx_v9_0_priv_reg_irq [amdgpu]] *ERROR* Illegal register access in command stream Feb 02 10:29:49 trudex kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=386096, emitted seq=386097 Feb 02 10:29:49 trudex kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process kwin_x11 pid 895 thread kwin_x11:cs0 pid 1006 Feb 02 10:29:49 trudex kernel: amdgpu 0000:03:00.0: GPU reset begin! Feb 02 10:29:50 trudex kernel: ------------[ cut here ]------------ Feb 02 10:29:50 trudex kernel: WARNING: CPU: 4 PID: 1210 at kernel/kthread.c:510 kthread_park+0x85/0xa0 Feb 02 10:29:50 trudex kernel: Modules linked in: rfcomm bnep 8021q mei_hdcp mxm_wmi amdgpu btusb btrtl btbcm btintel bluetooth snd_hda_codec_hdmi snd_hda_intel ecdh_generic rfkill snd_intel_nhlt snd_hda_codec ecc snd_oxygen snd_oxygen_lib snd_hda_core snd_mpu401_uart e> Feb 02 10:29:50 trudex kernel: CPU: 4 PID: 1210 Comm: ThreadPoolForeg Not tainted 5.4.16-900.native #1 Feb 02 10:29:50 trudex kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z170M Pro4S, BIOS P7.40 01/23/2018 Feb 02 10:29:50 trudex kernel: RIP: 0010:kthread_park+0x85/0xa0 Feb 02 10:29:50 trudex kernel: Code: 32 31 c0 5b 31 f6 41 5c 5d 89 f7 c3 0f 0b a8 04 49 8b 9c 24 c8 05 00 00 74 ab 0f 0b 5b b8 da ff ff ff 31 f6 41 5c 5d 89 f7 c3 <0f> 0b b8 f0 ff ff ff eb d0 0f 0b eb cc 66 66 2e 0f 1f 84 00 00 00 Feb 02 10:29:50 trudex kernel: RSP: 0018:ffffafd401edfaf8 EFLAGS: 00010202 Feb 02 10:29:50 trudex kernel: RAX: 0000000000000004 RBX: ffffa39cd04f5240 RCX: 0000000000000000 Feb 02 10:29:50 trudex kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa39a9f151f40 Feb 02 10:29:50 trudex kernel: RBP: ffffafd401edfb08 R08: 0000000000000000 R09: 0000000000000000 Feb 02 10:29:50 trudex kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffa39a9f151f40 Feb 02 10:29:50 trudex kernel: R13: ffffa398eada0000 R14: ffffa398eada4e88 R15: 0000000000000206 Feb 02 10:29:50 trudex kernel: FS: 00007efe25c67700(0000) GS:ffffa39cd6300000(0000) knlGS:0000000000000000 Feb 02 10:29:50 trudex kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 02 10:29:50 trudex kernel: CR2: 00007f2b7562a000 CR3: 000000038a88e004 CR4: 00000000003606e0 Feb 02 10:29:50 trudex kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Feb 02 10:29:50 trudex kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Feb 02 10:29:50 trudex kernel: Call Trace: Feb 02 10:29:50 trudex kernel: drm_sched_entity_fini+0x46/0x1d0 [gpu_sched] Feb 02 10:29:50 trudex kernel: drm_sched_entity_destroy+0x1b/0x30 [gpu_sched] Feb 02 10:29:50 trudex kernel: amdgpu_vm_fini+0x4e/0x3e0 [amdgpu] Feb 02 10:29:50 trudex kernel: amdgpu_driver_postclose_kms+0x17c/0x250 [amdgpu] Feb 02 10:29:50 trudex kernel: drm_file_free.part.0+0x232/0x2f0 Feb 02 10:29:50 trudex kernel: drm_close_helper.isra.0+0x6e/0x80 Feb 02 10:29:50 trudex kernel: drm_release+0x4c/0x90 Feb 02 10:29:50 trudex kernel: __fput+0xbf/0x270 Feb 02 10:29:50 trudex kernel: ____fput+0x9/0x10 Feb 02 10:29:50 trudex kernel: task_work_run+0x8f/0xc0 Feb 02 10:29:50 trudex kernel: do_exit+0x347/0xb50 Feb 02 10:29:50 trudex kernel: ? hrtimer_cancel+0x10/0x20 Feb 02 10:29:50 trudex kernel: do_group_exit+0x3e/0xa0 Feb 02 10:29:50 trudex kernel: get_signal+0x159/0x830 Feb 02 10:29:50 trudex kernel: do_signal+0x2f/0x270 Feb 02 10:29:50 trudex kernel: ? do_futex+0x122/0x1f0 Feb 02 10:29:50 trudex kernel: ? __x64_sys_futex+0x12b/0x160 Feb 02 10:29:50 trudex kernel: exit_to_usermode_loop+0x69/0xd0 Feb 02 10:29:50 trudex kernel: do_syscall_64+0x180/0x1c0 Feb 02 10:29:50 trudex kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Feb 02 10:29:50 trudex kernel: RIP: 0033:0x7efe39989e40 Feb 02 10:29:50 trudex kernel: Code: Bad RIP value. Feb 02 10:29:50 trudex kernel: RSP: 002b:00007efe25c66620 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca Feb 02 10:29:50 trudex kernel: RAX: fffffffffffffdfc RBX: 00007efe25c66710 RCX: 00007efe39989e40 Feb 02 10:29:50 trudex kernel: RDX: 0000000000000000 RSI: 0000000000000089 RDI: 00007efe25c66808 Feb 02 10:29:50 trudex kernel: RBP: 00007efe25c667e0 R08: 0000000000000000 R09: 00000000ffffffff Feb 02 10:29:50 trudex kernel: R10: 00007efe25c66710 R11: 0000000000000246 R12: 00007efe25c667b8 Feb 02 10:29:50 trudex kernel: R13: 00007efe25c66670 R14: 00007efe25c66808 R15: 00007efe25c66804 Feb 02 10:29:50 trudex kernel: ---[ end trace eab922733aa26bfb ]--- Feb 02 10:29:50 trudex systemd[1]: Started Telemetrics Daemon. Feb 02 10:29:50 trudex systemd[1]: Started Telemetrics Post Daemon. Feb 02 10:29:55 trudex kernel: [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out! Feb 02 10:29:55 trudex kernel: amdgpu 0000:03:00.0: GPU BACO reset Feb 02 10:29:55 trudex kernel: amdgpu 0000:03:00.0: GPU reset succeeded, trying to resume Feb 02 10:29:55 trudex kernel: [drm] PCIE GART of 512M enabled (table at 0x000000F400900000). Feb 02 10:29:55 trudex kernel: [drm] VRAM is lost due to GPU reset! Feb 02 10:29:55 trudex kernel: [drm] PSP is resuming... Feb 02 10:29:55 trudex kernel: [drm] reserve 0x400000 from 0xf5fe800000 for PSP TMR Feb 02 10:29:55 trudex kernel: [drm] UVD and UVD ENC initialized successfully. Feb 02 10:29:56 trudex kernel: [drm] VCE initialized successfully. Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring gfx uses VM inv eng 0 on hub 0 Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0 Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0 Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0 Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0 Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0 Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0 Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0 Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0 Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0 Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring sdma0 uses VM inv eng 0 on hub 1 Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring page0 uses VM inv eng 1 on hub 1 Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring sdma1 uses VM inv eng 4 on hub 1 Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring page1 uses VM inv eng 5 on hub 1 Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring uvd_0 uses VM inv eng 6 on hub 1 Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring uvd_enc_0.0 uses VM inv eng 7 on hub 1 Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring uvd_enc_0.1 uses VM inv eng 8 on hub 1 Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring vce0 uses VM inv eng 9 on hub 1 Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring vce1 uses VM inv eng 10 on hub 1 Feb 02 10:29:56 trudex kernel: amdgpu 0000:03:00.0: ring vce2 uses VM inv eng 11 on hub 1 Feb 02 10:29:56 trudex kernel: [drm] ECC is not present. Feb 02 10:29:56 trudex kernel: [drm] SRAM ECC is not present. -- You are receiving this mail because: You are watching the assignee of the bug. _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel