https://bugzilla.kernel.org/show_bug.cgi?id=205585 --- Comment #3 from Timothy Pearson (tpearson@xxxxxxxxxxxxxxxxxxxxx) --- Just had a chance to test on 5.4.0, still fails (haven't had a chance to bisect yet; I suspect it's more related to the 64-bit enablement on POWER in 5.4 than anything else). The EEH is quite strange, the PEST register decodes as: MMIO CFG Read Other Transaction Type An MMIO Load, MMIO I/O Write, or other transaction returned from the PCIe link with a status of Unsupported Request (UR) Failure address: 0x000000000000 Full trace [20341.276752702,3] PHB#0033[8:3]: PHB Freeze/Fence detected ! [20341.276848173,3] PHB#0033[8:3]: PCI FIR=2000000000000000 [20341.276900504,3] PHB#0033[8:3]: PCI FIR WOF=2000000000000000 [20341.276939625,3] PHB#0033[8:3]: NEST FIR=0000800000000000 [20341.276979866,3] PHB#0033[8:3]: NEST FIR WOF=0000800000000000 [20341.277023394,3] PHB#0033[8:3]: ERR RPT0=0000000000000001 [20341.277068184,3] PHB#0033[8:3]: ERR RPT1=0000000000000000 [20341.277110812,3] PHB#0033[8:3]: AIB ERR=0000200000000000 [20341.277830701,3] PHB#0033[8:3]: brdgCtl = 00000002 [20341.277906614,3] PHB#0033[8:3]: deviceStatus = 00000020 [20341.277946469,3] PHB#0033[8:3]: slotStatus = 00402000 [20341.277981186,3] PHB#0033[8:3]: linkStatus = e9010008 [20341.278025974,3] PHB#0033[8:3]: devCmdStatus = 00100107 [20341.278068859,3] PHB#0033[8:3]: devSecStatus = 00000000 [20341.278109829,3] PHB#0033[8:3]: rootErrorStatus = 00000000 [20341.278149196,3] PHB#0033[8:3]: corrErrorStatus = 00000000 [20341.278190145,3] PHB#0033[8:3]: uncorrErrorStatus = 00000000 [20341.278223684,3] PHB#0033[8:3]: devctl = 00000020 [20341.278276525,3] PHB#0033[8:3]: devStat = 00000000 [20341.278314241,3] PHB#0033[8:3]: tlpHdr1 = 00000000 [20341.278356746,3] PHB#0033[8:3]: tlpHdr2 = 00000000 [20341.278397163,3] PHB#0033[8:3]: tlpHdr3 = 00000000 [20341.278440709,3] PHB#0033[8:3]: tlpHdr4 = 00000000 [20341.278478424,3] PHB#0033[8:3]: sourceId = 00000000 [20341.278516547,3] PHB#0033[8:3]: nFir = 0000800000000000 [20341.278555975,3] PHB#0033[8:3]: nFirMask = 0030001c00000000 [20341.278598653,3] PHB#0033[8:3]: nFirWOF = 0000800000000000 [20341.278642004,3] PHB#0033[8:3]: phbPlssr = 0000001800000000 [20341.278686870,3] PHB#0033[8:3]: phbCsr = 0000001800000000 [20341.278731874,3] PHB#0033[8:3]: lemFir = 0004000100000100 [20341.278776158,3] PHB#0033[8:3]: lemErrorMask = 0000000000000000 [20341.278815229,3] PHB#0033[8:3]: lemWOF = 0000000100000000 [20341.278857015,3] PHB#0033[8:3]: phbErrorStatus = 000005a000000000 [20341.278909821,3] PHB#0033[8:3]: phbFirstErrorStatus = 0000002000000000 [20341.278951950,3] PHB#0033[8:3]: phbErrorLog0 = 2148000098000240 [20341.278999524,3] PHB#0033[8:3]: phbErrorLog1 = a008400000000000 [20341.279042839,3] PHB#0033[8:3]: phbTxeErrorStatus = 0000200000000000 [20341.279081676,3] PHB#0033[8:3]: phbTxeFirstErrorStatus = 0000200000000000 [20341.279120945,3] PHB#0033[8:3]: phbTxeErrorLog0 = 4000000000000000 [20341.279160833,3] PHB#0033[8:3]: phbTxeErrorLog1 = 0000000000000000 [20341.279207802,3] PHB#0033[8:3]: phbRxeArbErrorStatus = 0000000000000000 [20341.279254658,3] PHB#0033[8:3]: phbRxeArbFrstErrorStatus = 0000000000000000 [20341.279297181,3] PHB#0033[8:3]: phbRxeArbErrorLog0 = 0000000000000000 [20341.279334227,3] PHB#0033[8:3]: phbRxeArbErrorLog1 = 0000000000000000 [20341.279376968,3] PHB#0033[8:3]: phbRxeMrgErrorStatus = 0000000000000001 [20341.279420726,3] PHB#0033[8:3]: phbRxeMrgFrstErrorStatus = 0000000000000001 [20341.279469009,3] PHB#0033[8:3]: phbRxeMrgErrorLog0 = 0000000000000000 [20341.279512839,3] PHB#0033[8:3]: phbRxeMrgErrorLog1 = 0000000000000000 [20341.279561496,3] PHB#0033[8:3]: phbRxeTceErrorStatus = 0000000000000000 [20341.279604696,3] PHB#0033[8:3]: phbRxeTceFrstErrorStatus = 0000000000000000 [20341.279645952,3] PHB#0033[8:3]: phbRxeTceErrorLog0 = 0000000000000000 [20341.279685644,3] PHB#0033[8:3]: phbRxeTceErrorLog1 = 0000000000000000 [20341.279731458,3] PHB#0033[8:3]: phbPblErrorStatus = 0000000000000800 [20341.279778323,3] PHB#0033[8:3]: phbPblFirstErrorStatus = 0000000000000800 [20341.279825433,3] PHB#0033[8:3]: phbPblErrorLog0 = 0000000000000000 [20341.279866852,3] PHB#0033[8:3]: phbPblErrorLog1 = 00000000028de410 [20341.279903104,3] PHB#0033[8:3]: phbPcieDlpErrorLog1 = 0000000000000000 [20341.279942888,3] PHB#0033[8:3]: phbPcieDlpErrorLog2 = 0000000000000000 [20341.279984925,3] PHB#0033[8:3]: phbPcieDlpErrorStatus = 0000000000000000 [20341.280033282,3] PHB#0033[8:3]: phbRegbErrorStatus = 0010001000000000 [20341.280080310,3] PHB#0033[8:3]: phbRegbFirstErrorStatus = 0000001000000000 [20341.280126330,3] PHB#0033[8:3]: phbRegbErrorLog0 = 4800003c00000000 [20341.280173657,3] PHB#0033[8:3]: phbRegbErrorLog1 = 0000000000000200 [20341.280218925,3] PHB#0033[8:3]: PEST[1ff] = 3740002a01000000 0000000000000000 [ 1580.231935] EEH: PHB#33 failure detected, location: N/A [ 1580.231958] EEH: Frozen PHB#33-PE#0 detected [ 1580.231969] EEH: Call Trace: [ 1580.231983] EEH: [00000000741e7c92] __eeh_send_failure_event+0x78/0x150 [ 1580.232006] EEH: [0000000019c0a3ea] eeh_dev_check_failure+0x1d8/0x6b0 [ 1580.232019] EEH: [00000000d1114f7e] eeh_check_failure+0x98/0x100 [ 1580.232080] EEH: [0000000026fdad67] amdgpu_mm_rreg+0x20c/0x250 [amdgpu] [ 1580.232134] EEH: [0000000087736ee4] vi_flush_hdp+0xa0/0xc0 [amdgpu] [ 1580.232191] EEH: [000000000b00465e] amdgpu_gart_bind+0x78/0x140 [amdgpu] [ 1580.232247] EEH: [00000000e410157a] amdgpu_ttm_gart_bind+0x124/0x140 [amdgpu] [ 1580.232295] EEH: [0000000027696b17] amdgpu_ttm_alloc_gart+0x19c/0x230 [amdgpu] [ 1580.232350] EEH: [00000000abff626d] amdgpu_vm_sdma_map_table+0x4c/0x70 [amdgpu] [ 1580.232411] EEH: [000000003babc62e] amdgpu_vm_clear_bo+0x188/0x460 [amdgpu] [ 1580.232460] EEH: [000000003135d9d5] amdgpu_vm_update_ptes+0x300/0x5f0 [amdgpu] [ 1580.232513] EEH: [00000000a9b62a4c] amdgpu_vm_bo_update_mapping+0x100/0x140 [amdgpu] [ 1580.232565] EEH: [00000000c53ee852] amdgpu_vm_bo_update+0x348/0x8a0 [amdgpu] [ 1580.232614] EEH: [00000000e468e987] amdgpu_gem_va_ioctl+0x5c4/0x620 [amdgpu] [ 1580.232644] EEH: [000000002c0a19e7] drm_ioctl_kernel+0xfc/0x180 [drm] [ 1580.232671] EEH: [000000005cb0f244] drm_ioctl+0x238/0x480 [drm] [ 1580.232725] EEH: [00000000b812c3a6] amdgpu_drm_ioctl+0x70/0xd0 [amdgpu] [ 1580.232749] EEH: [000000004de566d7] do_vfs_ioctl+0xe0/0xac0 [ 1580.232770] EEH: [0000000045206404] ksys_ioctl+0xc4/0x110 [ 1580.232782] EEH: [000000001e273b3a] sys_ioctl+0x28/0x80 [ 1580.232804] EEH: [00000000aa248bf4] system_call+0x5c/0x68 [ 1580.232834] EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 failures. [ 1580.232880] EEH: Notify device drivers to shutdown [ 1580.232911] EEH: Beginning: 'error_detected(IO frozen)' [ 1580.232933] PCI 0033:00:00.0#01fe: EEH: no driver [ 1580.232935] PCI 0033:01:00.0#0000: EEH: driver not EEH aware [ 1580.232957] PCI 0033:01:00.1#0000: EEH: driver not EEH aware [ 1580.232970] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'none' [ 1580.232998] EEH: Collect temporary log [ 1580.233008] PHB4 PHB#51 Diag-data (Version: 1) [ 1580.233018] brdgCtl: 00000002 [ 1580.233028] RootSts: 00000020 00402000 e9010008 00100107 00000000 [ 1580.233040] nFir: 0000800000000000 0030001c00000000 0000800000000000 [ 1580.233062] PhbSts: 0000001800000000 0000001800000000 [ 1580.233082] Lem: 0004000100000100 0000000000000000 0000000100000000 [ 1580.233104] PhbErr: 000005a000000000 0000002000000000 2148000098000240 a008400000000000 [ 1580.233136] PhbTxeErr: 0000200000000000 0000200000000000 4000000000000000 0000000000000000 [ 1580.233169] RxeMrgErr: 0000000000000001 0000000000000001 0000000000000000 0000000000000000 [ 1580.233192] PblErr: 0000000000000800 0000000000000800 0000000000000000 00000000028de410 [ 1580.233225] RegbErr: 0010001000000000 0000001000000000 4800003c00000000 0000000000000200 [ 1580.233259] EEH: Reset with hotplug activity [ 1580.891352] snd_hda_codec_hdmi hdaudioC0D0: Unable to sync register 0x2f0d00. -5 [ 1590.340025] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=7463, emitted seq=7465 [ 1590.340117] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0 [ 1590.340172] amdgpu 0033:01:00.0: GPU reset begin! [ 1590.350000] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=325761, emitted seq=325763 [ 1590.350057] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process hyperspace pid 4160 thread hyperspace:cs0 pid 4161 [ 1590.350089] amdgpu 0033:01:00.0: GPU reset begin! [ 1590.350108] [drm] Bailing on TDR for s_job:4f608, as another already in progress [ 1590.350923] amdgpu: [powerplay] [ 1590.350923] last message was failed ret is 65535 [ 1590.350949] amdgpu: [powerplay] [ 1590.350949] failed to send message 261 ret is 65535 [ 1590.350971] amdgpu: [powerplay] [ 1590.350971] last message was failed ret is 65535 [ 1590.350983] amdgpu: [powerplay] [ 1590.350983] failed to send message 261 ret is 65535 [ 1590.350996] amdgpu: [powerplay] [ 1590.350996] last message was failed ret is 65535 [ 1590.351017] amdgpu: [powerplay] [ 1590.351017] failed to send message 261 ret is 65535 [ 1590.351030] amdgpu: [powerplay] [ 1590.351030] last message was failed ret is 65535 [ 1590.351064] amdgpu: [powerplay] [ 1590.351064] failed to send message 261 ret is 65535 [ 1590.351096] amdgpu: [powerplay] [ 1590.351096] last message was failed ret is 65535 [ 1590.351127] amdgpu: [powerplay] [ 1590.351127] failed to send message 261 ret is 65535 [ 1590.351158] amdgpu: [powerplay] [ 1590.351158] last message was failed ret is 65535 [ 1590.351202] amdgpu: [powerplay] [ 1590.351202] failed to send message 261 ret is 65535 [ 1590.351224] amdgpu: [powerplay] [ 1590.351224] last message was failed ret is 65535 [ 1590.351236] amdgpu: [powerplay] [ 1590.351236] failed to send message 261 ret is 65535 [ 1590.351251] amdgpu: [powerplay] [ 1590.351251] last message was failed ret is 65535 [ 1590.351272] amdgpu: [powerplay] [ 1590.351272] failed to send message 261 ret is 65535 [ 1590.351303] amdgpu: [powerplay] [ 1590.351303] last message was failed ret is 65535 [ 1590.351324] amdgpu: [powerplay] [ 1590.351324] failed to send message 261 ret is 65535 [ 1590.351356] amdgpu: [powerplay] [ 1590.351356] last message was failed ret is 65535 [ 1590.351378] amdgpu: [powerplay] [ 1590.351378] failed to send message 261 ret is 65535 [ 1590.351410] amdgpu: [powerplay] [ 1590.351410] last message was failed ret is 65535 [ 1590.351441] amdgpu: [powerplay] [ 1590.351441] failed to send message 261 ret is 65535 [ 1590.351463] amdgpu: [powerplay] [ 1590.351463] last message was failed ret is 65535 [ 1590.351485] amdgpu: [powerplay] [ 1590.351485] failed to send message 261 ret is 65535 [ 1590.351520] amdgpu: [powerplay] [ 1590.351520] last message was failed ret is 65535 [ 1590.351541] amdgpu: [powerplay] [ 1590.351541] failed to send message 261 ret is 65535 [ 1590.351572] amdgpu: [powerplay] [ 1590.351572] last message was failed ret is 65535 [ 1590.351603] amdgpu: [powerplay] [ 1590.351603] failed to send message 261 ret is 65535 [ 1590.351634] amdgpu: [powerplay] [ 1590.351634] last message was failed ret is 65535 [ 1590.351666] amdgpu: [powerplay] [ 1590.351666] failed to send message 261 ret is 65535 [ 1590.351698] amdgpu: [powerplay] [ 1590.351698] last message was failed ret is 65535 [ 1590.351730] amdgpu: [powerplay] [ 1590.351730] failed to send message 261 ret is 65535 [ 1590.351761] amdgpu: [powerplay] [ 1590.351761] last message was failed ret is 65535 [ 1590.351795] amdgpu: [powerplay] [ 1590.351795] failed to send message 261 ret is 65535 [ 1590.351980] amdgpu: [powerplay] [ 1590.351980] last message was failed ret is 65535 [ 1590.352014] amdgpu: [powerplay] [ 1590.352014] failed to send message 306 ret is 65535 [ 1590.352039] amdgpu: [powerplay] [ 1590.352039] last message was failed ret is 65535 [ 1590.352080] amdgpu: [powerplay] [ 1590.352080] failed to send message 5e ret is 65535 [ 1590.352103] amdgpu: [powerplay] [ 1590.352103] last message was failed ret is 65535 [ 1590.352134] amdgpu: [powerplay] [ 1590.352134] failed to send message 145 ret is 65535 [ 1590.352156] amdgpu: [powerplay] [ 1590.352156] last message was failed ret is 65535 [ 1590.352190] amdgpu: [powerplay] [ 1590.352190] failed to send message 146 ret is 65535 [ 1590.352225] amdgpu: [powerplay] [ 1590.352225] last message was failed ret is 65535 [ 1590.352271] amdgpu: [powerplay] [ 1590.352271] failed to send message 148 ret is 65535 [ 1590.352292] amdgpu: [powerplay] [ 1590.352292] last message was failed ret is 65535 [ 1590.352304] amdgpu: [powerplay] [ 1590.352304] failed to send message 145 ret is 65535 [ 1590.352339] amdgpu: [powerplay] [ 1590.352339] last message was failed ret is 65535 [ 1590.352370] amdgpu: [powerplay] [ 1590.352370] failed to send message 146 ret is 65535 [ 1590.383835] [drm] REG_WAIT timeout 10us * 3000 tries - dce110_stream_encoder_dp_blank line:956 [ 1590.383875] ------------[ cut here ]------------ [ 1590.383912] WARNING: CPU: 48 PID: 1214 at drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:332 generic_reg_wait+0x214/0x230 [amdgpu] [ 1590.383945] Modules linked in: i2c_dev uinput amdgpu snd_usb_audio drm_vram_helper snd_usbmidi_lib gpu_sched ttm snd_rawmidi snd_seq_device ses mc drm_kms_helper snd_hda_codec_hdmi enclosure joydev sd_mod evdev scsi_transport_sas drm snd_hda_intel sg snd_hda_codec drm_panel_orientation_quirks snd_hda_core syscopyarea sysfillrect ecb snd_hwdep aacraid sysimgblt fb_sys_fops snd_pcm nvme nvme_core xts i2c_algo_bit snd_timer snd soundcore ctr cbc ofpart vmx_crypto ipmi_powernv ipmi_devintf powernv_flash gf128mul mtd ipmi_msghandler opal_prd at24 binfmt_misc parport_pc lp parport ip_tables x_tables autofs4 nfsv3 nfs_acl nfs lockd grace sunrpc fscache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod xhci_pci xhci_hcd usbcore tg3 libphy [ 1590.384181] CPU: 48 PID: 1214 Comm: kworker/48:2 Not tainted 5.4.0 #5 [ 1590.384194] Workqueue: events drm_sched_job_timedout [gpu_sched] [ 1590.384205] NIP: c00800000888505c LR: c00800000888504c CTR: c000000000715d70 [ 1590.384238] REGS: c0000007dd55ec40 TRAP: 0700 Not tainted (5.4.0) [ 1590.384257] MSR: 9000000002029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28224228 XER: 00000000 [ 1590.384284] CFAR: c0000000001b66f4 IRQMASK: 0 [ 1590.384284] GPR00: c00800000888504c c0000007dd55eed0 c0080000089f5000 0000000000000052 [ 1590.384284] GPR04: c0000007fdd1ce18 c0000007fdda5858 0000000000000490 c0000007fffc9000 [ 1590.384284] GPR08: 0000000000000007 0000000000000000 00000007fced0000 9000000002001033 [ 1590.384284] GPR12: 0000000000004000 c0000007fffc9000 c000200715000000 c0000007eff449c0 [ 1590.384284] GPR16: c0000007dc7a6000 c0000007def45300 0000000000000000 00000000000003bc [ 1590.384284] GPR20: c0080000088f6470 0000000000000000 0000000000004ea4 0000000000010000 [ 1590.384284] GPR24: 0000000000000000 c00800000890ca90 c0000007a9e40680 0000000000000bb8 [ 1590.384284] GPR28: 0000000000000010 0000000000000bb8 000000000000000a 0000000000000bb9 [ 1590.384414] NIP [c00800000888505c] generic_reg_wait+0x214/0x230 [amdgpu] [ 1590.384450] LR [c00800000888504c] generic_reg_wait+0x204/0x230 [amdgpu] [ 1590.384467] Call Trace: [ 1590.384499] [c0000007dd55eed0] [c00800000888504c] generic_reg_wait+0x204/0x230 [amdgpu] (unreliable) [ 1590.384548] [c0000007dd55efa0] [c00800000882caec] dce110_stream_encoder_dp_blank+0x104/0x170 [amdgpu] [ 1590.384601] [c0000007dd55f030] [c00800000885a07c] dce110_blank_stream+0xf4/0x120 [amdgpu] [ 1590.384632] [c0000007dd55f060] [c0080000088743bc] core_link_disable_stream+0x64/0x420 [amdgpu] [ 1590.384692] [c0000007dd55f140] [c008000008857dbc] dce110_reset_hw_ctx_wrap+0xf4/0x2e0 [amdgpu] [ 1590.384745] [c0000007dd55f200] [c00800000885a2e0] dce110_apply_ctx_to_hw+0x58/0x600 [amdgpu] [ 1590.384797] [c0000007dd55f2d0] [c00800000886dcec] dc_commit_state+0x3d4/0x820 [amdgpu] [ 1590.384853] [c0000007dd55f400] [c0080000087fe94c] amdgpu_dm_atomic_commit_tail+0x3c4/0x19a8 [amdgpu] [ 1590.384888] [c0000007dd55f700] [c008000007d93fb0] commit_tail+0xf8/0x1f0 [drm_kms_helper] [ 1590.384912] [c0000007dd55f740] [c008000007d942a8] drm_atomic_helper_commit+0x1e0/0x1f0 [drm_kms_helper] [ 1590.384951] [c0000007dd55f780] [c0080000087fbac8] amdgpu_dm_atomic_commit+0x110/0x140 [amdgpu] [ 1590.384992] [c0000007dd55f7e0] [c0080000079ce2cc] drm_atomic_commit+0x74/0xa0 [drm] [ 1590.385016] [c0000007dd55f850] [c008000007d94768] drm_atomic_helper_disable_all+0x290/0x2b0 [drm_kms_helper] [ 1590.385044] [c0000007dd55f8a0] [c008000007d949dc] drm_atomic_helper_suspend+0x154/0x1a0 [drm_kms_helper] [ 1590.385094] [c0000007dd55f920] [c0080000087f717c] dm_suspend+0x44/0xa0 [amdgpu] [ 1590.385124] [c0000007dd55f950] [c008000008621e2c] amdgpu_device_ip_suspend_phase1+0xe4/0x190 [amdgpu] [ 1590.385163] [c0000007dd55f9d0] [c008000008623ddc] amdgpu_device_ip_suspend+0x44/0xe0 [amdgpu] [ 1590.385192] [c0000007dd55fa10] [c00800000888de54] amdgpu_device_pre_asic_reset+0x248/0x28c [amdgpu] [ 1590.385230] [c0000007dd55fab0] [c00800000888e7b8] amdgpu_device_gpu_recover+0x2f0/0xb4c [amdgpu] [ 1590.385268] [c0000007dd55fb90] [c008000008779f3c] amdgpu_job_timedout+0x124/0x170 [amdgpu] [ 1590.385290] [c0000007dd55fc30] [c008000007651244] drm_sched_job_timedout+0x6c/0x110 [gpu_sched] [ 1590.385336] [c0000007dd55fc70] [c000000000154ee0] process_one_work+0x260/0x520 [ 1590.385379] [c0000007dd55fd10] [c000000000155228] worker_thread+0x88/0x5f0 [ 1590.385400] [c0000007dd55fdb0] [c00000000015f21c] kthread+0x19c/0x1b0 [ 1590.385430] [c0000007dd55fe20] [c00000000000bd54] ret_from_kernel_thread+0x5c/0x68 [ 1590.385463] Instruction dump: [ 1590.385480] 4bfffed4 3c620000 e8633ab8 7e679b78 7e86a378 7f65db78 7fc4f378 4800f091 [ 1590.385513] e8410018 813a0020 2f890001 419eff7c <0fe00000> 4bffff74 60000000 60000000 [ 1590.385546] ---[ end trace 59567a2f8b8649ed ]--- [ 1591.478349] PCI 0033:01:00.0#0000: EEH: 2100000 reads ignored for recovering device at location=CPU2 Slot1 (16x) driver=amdgpu [ 1591.478370] PCI 0033:01:00.0#0000: EEH: Might be infinite loop in amdgpu driver [ 1591.478382] CPU: 48 PID: 1214 Comm: kworker/48:2 Tainted: G W 5.4.0 #5 [ 1591.478405] Workqueue: events drm_sched_job_timedout [gpu_sched] [ 1591.478414] Call Trace: [ 1591.478422] [c0000007dd55e940] [c000000000a9ccc8] dump_stack+0xbc/0x104 (unreliable) [ 1591.478434] [c0000007dd55e980] [c00000000003e788] eeh_dev_check_failure+0x598/0x6b0 [ 1591.478455] [c0000007dd55ea30] [c00000000003eb08] eeh_check_failure+0x98/0x100 [ 1591.478491] [c0000007dd55ea70] [c008000008622744] amdgpu_mm_rreg+0x20c/0x250 [amdgpu] [ 1591.478539] [c0000007dd55eac0] [c0080000086298f4] cail_reg_read+0x2c/0x50 [amdgpu] [ 1591.478577] [c0000007dd55eae0] [c00800000863255c] atom_get_src_int+0x104/0xa00 [amdgpu] [ 1591.478615] [c0000007dd55eb90] [c008000008633e30] atom_op_test+0xd8/0x1d0 [amdgpu] [ 1591.478660] [c0000007dd55ec20] [c008000008636a2c] amdgpu_atom_execute_table_locked+0x204/0x3e0 [amdgpu] [ 1591.478701] [c0000007dd55ed20] [c008000008636d30] atom_op_calltable+0x128/0x1e0 [amdgpu] [ 1591.478740] [c0000007dd55eda0] [c008000008636a2c] amdgpu_atom_execute_table_locked+0x204/0x3e0 [amdgpu] [ 1591.478770] [c0000007dd55eea0] [c008000008636e58] amdgpu_atom_execute_table+0x70/0xb0 [amdgpu] [ 1591.478829] [c0000007dd55eee0] [c008000008810f30] transmitter_control_v1_6+0x128/0x220 [amdgpu] [ 1591.478887] [c0000007dd55ef40] [c00800000880c410] bios_parser_transmitter_control+0x38/0x70 [amdgpu] [ 1591.478944] [c0000007dd55ef60] [c00800000882f678] dce110_link_encoder_disable_output+0xd0/0x1c0 [amdgpu] [ 1591.478997] [c0000007dd55f020] [c00800000887cbfc] dp_disable_link_phy+0xa4/0x1d0 [amdgpu] [ 1591.479029] [c0000007dd55f060] [c008000008874488] core_link_disable_stream+0x130/0x420 [amdgpu] [ 1591.479082] [c0000007dd55f140] [c008000008857dbc] dce110_reset_hw_ctx_wrap+0xf4/0x2e0 [amdgpu] [ 1591.479134] [c0000007dd55f200] [c00800000885a2e0] dce110_apply_ctx_to_hw+0x58/0x600 [amdgpu] [ 1591.479186] [c0000007dd55f2d0] [c00800000886dcec] dc_commit_state+0x3d4/0x820 [amdgpu] [ 1591.479241] [c0000007dd55f400] [c0080000087fe94c] amdgpu_dm_atomic_commit_tail+0x3c4/0x19a8 [amdgpu] [ 1591.479280] [c0000007dd55f700] [c008000007d93fb0] commit_tail+0xf8/0x1f0 [drm_kms_helper] [ 1591.479325] [c0000007dd55f740] [c008000007d942a8] drm_atomic_helper_commit+0x1e0/0x1f0 [drm_kms_helper] [ 1591.479381] [c0000007dd55f780] [c0080000087fbac8] amdgpu_dm_atomic_commit+0x110/0x140 [amdgpu] [ 1591.479419] [c0000007dd55f7e0] [c0080000079ce2cc] drm_atomic_commit+0x74/0xa0 [drm] [ 1591.479445] [c0000007dd55f850] [c008000007d94768] drm_atomic_helper_disable_all+0x290/0x2b0 [drm_kms_helper] [ 1591.479484] [c0000007dd55f8a0] [c008000007d949dc] drm_atomic_helper_suspend+0x154/0x1a0 [drm_kms_helper] [ 1591.479542] [c0000007dd55f920] [c0080000087f717c] dm_suspend+0x44/0xa0 [amdgpu] [ 1591.479589] [c0000007dd55f950] [c008000008621e2c] amdgpu_device_ip_suspend_phase1+0xe4/0x190 [amdgpu] [ 1591.479640] [c0000007dd55f9d0] [c008000008623ddc] amdgpu_device_ip_suspend+0x44/0xe0 [amdgpu] [ 1591.479674] [c0000007dd55fa10] [c00800000888de54] amdgpu_device_pre_asic_reset+0x248/0x28c [amdgpu] [ 1591.479712] [c0000007dd55fab0] [c00800000888e7b8] amdgpu_device_gpu_recover+0x2f0/0xb4c [amdgpu] [ 1591.479769] [c0000007dd55fb90] [c008000008779f3c] amdgpu_job_timedout+0x124/0x170 [amdgpu] [ 1591.479815] [c0000007dd55fc30] [c008000007651244] drm_sched_job_timedout+0x6c/0x110 [gpu_sched] [ 1591.479860] [c0000007dd55fc70] [c000000000154ee0] process_one_work+0x260/0x520 [ 1591.479903] [c0000007dd55fd10] [c000000000155228] worker_thread+0x88/0x5f0 [ 1591.479923] [c0000007dd55fdb0] [c00000000015f21c] kthread+0x19c/0x1b0 [ 1591.479953] [c0000007dd55fe20] [c00000000000bd54] ret_from_kernel_thread+0x5c/0x68 [ 1592.584699] PCI 0033:01:00.0#0000: EEH: 4200000 reads ignored for recovering device at location=CPU2 Slot1 (16x) driver=amdgpu [ 1592.584723] PCI 0033:01:00.0#0000: EEH: Might be infinite loop in amdgpu driver -- You are receiving this mail because: You are watching the assignee of the bug. _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel