gfx timeout and GPU reset while hundreds apps run on AMD GPU, the error happen about weekly. Env: Linux version 5.3.15-050315.2020063001-generic (root@k8snode) (gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)) #appstream SMP PREEMPT Sat Jul 4 10:28:24 CST 2020 vainfo: VA-API version: 1.5 (libva 2.5.0) vainfo: Driver version: Mesa Gallium driver 19.0.5 for AMD Radeon (TM) RX 480 Graphics (POLARIS10, DRM 3.33.0, 5.3.15-050315.2020063001-generic, LLVM 7.0.0) vainfo: Supported profile and entrypoints ii libdrm-dev 2.4.97-1ubuntu1~18.04.1 Userspace interface to kernel DRM services -- development files log: Sep 29 17:42:25 k8snode244 kernel: [952607.136002] amdgpu 0005:01:00.0: GPU fault detected: 146 0x0000480c for process Media42797 pid 637190 thread appstream:cs0 pid 637206 Sep 29 17:42:25 k8snode244 kernel: [952607.136007] amdgpu 0005:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 Sep 29 17:42:25 k8snode244 kernel: [952607.136009] amdgpu 0005:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E04800C Sep 29 17:42:25 k8snode244 kernel: [952607.136013] amdgpu 0005:01:00.0: VM fault (0x0c, vmid 7, pasid 32774) at page 0, read from 'TC4' (0x54433400) (72) Sep 29 17:42:35 k8snode244 kernel: [952617.235478] [drm:amdgpu_job_timedout [amdgpu]]*ERROR* ring gfx timeout, signaled seq=313011137, emitted seq=313011140 Sep 29 17:42:35 k8snode244 kernel: [952617.235560] [drm:amdgpu_job_timedout [amdgpu]]*ERROR* Process information: process Media42797 pid 637190 thread appstream:cs0 pid 637206 Sep 29 17:42:35 k8snode244 kernel: [952617.235578] amdgpu 0005:01:00.0: GPU reset begin! Sep 29 17:42:35 k8snode244 kernel: [952617.236276] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]]*ERROR* suspend of IP block <vce_v3_0> failed -22 Sep 29 17:42:36 k8snode244 kernel: [952617.842417] amdgpu 0005:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]*ERROR* ring kiq_2.1.0 test failed (-110) Sep 29 17:42:36 k8snode244 kernel: [952617.842500] [drm:gfx_v8_0_hw_fini [amdgpu]]*ERROR* KCQ disable failed Sep 29 17:42:36 k8snode244 kernel: [952618.098569] cp is busy, skip halt cp Sep 29 17:42:36 k8snode244 kernel: [952618.356730] rlc is busy, skip halt rlc Sep 29 17:42:36 k8snode244 kernel: [952618.357783] amdgpu 0005:01:00.0: GPU pci config reset Sep 29 17:42:36 k8snode244 kernel: [952618.476296] amdgpu 0005:01:00.0: GPU reset succeeded, trying to resume Sep 29 17:42:36 k8snode244 kernel: [952618.478750] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000). Sep 29 17:42:36 k8snode244 kernel: [952618.478770] [drm] VRAM is lost due to GPU reset! Sep 29 17:42:37 k8snode244 kernel: [952618.610145] [drm] UVD and UVD ENC initialized successfully. Sep 29 17:42:37 k8snode244 kernel: [952618.765301] [drm] VCE initialized successfully. Sep 29 17:42:37 k8snode244 kernel: [952618.785034] [drm] recover vram bo from shadow start Sep 29 17:42:37 k8snode244 kernel: [952618.849311] [drm] recover vram bo from shadow done Sep 29 17:42:37 k8snode244 kernel: [952618.849319] [drm] Skip scheduling IBs! Sep 29 17:42:37 k8snode244 kernel: [952618.849321] ------------[ cut here ]------------ Sep 29 17:42:37 k8snode244 kernel: [952618.849386] WARNING: CPU: 14 PID: 681455 at /home/lwj/build/kernel/include/linux/dma-fence.h:513 drm_sched_resubmit_jobs+0x188/0x1a8 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.849387] Modules linked in: sch_tbf veth xt_recent br_netfilter bridge stp llc xt_addrtype bpfilter ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs overlay nls_iso8859_1 ipmi_ssif snd_hda_intel snd_hda_codec joydev snd_hda_core input_leds snd_hwdep snd_pcm snd_timer snd ipmi_si soundcore ipmi_devintf ipmi_msghandler tcp_bbr sch_fq binder_dkms(OE) autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 multipath linear ses enclosure hibmc_drm hid_generic usbhid hid marvell aes_ce_blk aes_ce_cipher amdgpu i2c_algo_bit crct10dif_ce gpu_sched drm_vram_helper ttm ghash_ce sha2_ce drm_kms_helper syscopyarea sha256_arm64 sysfillrect sysimgblt fb_sys_fops sha1_ce drm hisi_sas_v2_hw hisi_sas_main libsas ehci_platform scsi_transport_sas hns_dsaf hns_enet_drv hns_mdio hnae aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 Sep 29 17:42:37 k8snode244 kernel: [952618.849440] CPU: 14 PID: 681455 Comm: kworker/14:0 Kdump: loaded Tainted: G W OE 5.3.15-050315.2020063001-generic #appstream Sep 29 17:42:37 k8snode244 kernel: [952618.849442] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.63 09/19/2019 Sep 29 17:42:37 k8snode244 kernel: [952618.849450] Workqueue: events drm_sched_job_timedout [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.849454] pstate: 40400005 (nZcv daif +PAN -UAO) Sep 29 17:42:37 k8snode244 kernel: [952618.849458] pc : drm_sched_resubmit_jobs+0x188/0x1a8 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.849461] lr : drm_sched_resubmit_jobs+0x188/0x1a8 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.849462] sp : ffff00009c873c20 Sep 29 17:42:37 k8snode244 kernel: [952618.849464] x29: ffff00009c873c20 x28: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.849466] x27: ffff808e72af2300 x26: 00000000ffffff83 Sep 29 17:42:37 k8snode244 kernel: [952618.849469] x25: ffff0000094c66b8 x24: 00000000001fb320 Sep 29 17:42:37 k8snode244 kernel: [952618.849471] x23: 0000000000000001 x22: ffff801fd37a6bb8 Sep 29 17:42:37 k8snode244 kernel: [952618.849473] x21: ffff801fd37a6a30 x20: ffff80188be5ec00 Sep 29 17:42:37 k8snode244 kernel: [952618.849476] x19: ffff801f7dcc9400 x18: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.849478] x17: 0000000000000001 x16: 0000000000000007 Sep 29 17:42:37 k8snode244 kernel: [952618.849480] x15: 0000000000000000 x14: 0000000000002400 Sep 29 17:42:37 k8snode244 kernel: [952618.849481] x13: 0000000000000000 x12: ffff000011ba7000 Sep 29 17:42:37 k8snode244 kernel: [952618.849484] x11: 0000000000078918 x10: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.849486] x9 : 0000000000000001 x8 : 000000000001a8ff Sep 29 17:42:37 k8snode244 kernel: [952618.849488] x7 : ffff000011ba7000 x6 : 00002b369e3d3bcd Sep 29 17:42:37 k8snode244 kernel: [952618.849490] x5 : 0000000000000001 x4 : ffff8017dbbc2248 Sep 29 17:42:37 k8snode244 kernel: [952618.849492] x3 : ffff8017dbbc2248 x2 : b5be5ef7ef51a000 Sep 29 17:42:37 k8snode244 kernel: [952618.849494] x1 : 0000000000000000 x0 : 0000000000000024 Sep 29 17:42:37 k8snode244 kernel: [952618.849496] Call trace: Sep 29 17:42:37 k8snode244 kernel: [952618.849500] drm_sched_resubmit_jobs+0x188/0x1a8 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.849591] amdgpu_device_gpu_recover+0x460/0xb58 [amdgpu] Sep 29 17:42:37 k8snode244 kernel: [952618.849671] amdgpu_job_timedout+0xe4/0x108 [amdgpu] Sep 29 17:42:37 k8snode244 kernel: [952618.849676] drm_sched_job_timedout+0x84/0xf8 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.849682] process_one_work+0x1ec/0x470 Sep 29 17:42:37 k8snode244 kernel: [952618.849684] worker_thread+0x48/0x458 Sep 29 17:42:37 k8snode244 kernel: [952618.849687] kthread+0x110/0x118 Sep 29 17:42:37 k8snode244 kernel: [952618.849691] ret_from_fork+0x10/0x18 Sep 29 17:42:37 k8snode244 kernel: [952618.849692] ---[ end trace 5b779f1dd4a6e6cf ]--- Sep 29 17:42:37 k8snode244 kernel: [952618.849697] [drm] Skip scheduling IBs! Sep 29 17:42:37 k8snode244 kernel: [952618.849698] ------------[ cut here ]------------ Sep 29 17:42:37 k8snode244 kernel: [952618.849723] WARNING: CPU: 14 PID: 681455 at /home/lwj/build/kernel/include/linux/dma-fence.h:513 drm_sched_resubmit_jobs+0x188/0x1a8 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.849724] Modules linked in: sch_tbf veth xt_recent br_netfilter bridge stp llc xt_addrtype bpfilter ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs overlay nls_iso8859_1 ipmi_ssif snd_hda_intel snd_hda_codec joydev snd_hda_core input_leds snd_hwdep snd_pcm snd_timer snd ipmi_si soundcore ipmi_devintf ipmi_msghandler tcp_bbr sch_fq binder_dkms(OE) autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 multipath linear ses enclosure hibmc_drm hid_generic usbhid hid marvell aes_ce_blk aes_ce_cipher amdgpu i2c_algo_bit crct10dif_ce gpu_sched drm_vram_helper ttm ghash_ce sha2_ce drm_kms_helper syscopyarea sha256_arm64 sysfillrect sysimgblt fb_sys_fops sha1_ce drm hisi_sas_v2_hw hisi_sas_main libsas ehci_platform scsi_transport_sas hns_dsaf hns_enet_drv hns_mdio hnae aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 Sep 29 17:42:37 k8snode244 kernel: [952618.849759] CPU: 14 PID: 681455 Comm: kworker/14:0 Kdump: loaded Tainted: G W OE 5.3.15-050315.2020063001-generic #appstream Sep 29 17:42:37 k8snode244 kernel: [952618.849760] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.63 09/19/2019 Sep 29 17:42:37 k8snode244 kernel: [952618.849767] Workqueue: events drm_sched_job_timedout [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.849770] pstate: 40400005 (nZcv daif +PAN -UAO) Sep 29 17:42:37 k8snode244 kernel: [952618.849774] pc : drm_sched_resubmit_jobs+0x188/0x1a8 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.849777] lr : drm_sched_resubmit_jobs+0x188/0x1a8 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.849778] sp : ffff00009c873c20 Sep 29 17:42:37 k8snode244 kernel: [952618.849779] x29: ffff00009c873c20 x28: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.849782] x27: ffff80901dc28000 x26: 00000000ffffff83 Sep 29 17:42:37 k8snode244 kernel: [952618.849784] x25: ffff0000094c66b8 x24: 00000000001fb320 Sep 29 17:42:37 k8snode244 kernel: [952618.849786] x23: 0000000000000001 x22: ffff801fd37a6bb8 Sep 29 17:42:37 k8snode244 kernel: [952618.849788] x21: ffff801fd37a6a30 x20: ffff801fd37a6b88 Sep 29 17:42:37 k8snode244 kernel: [952618.849790] x19: ffff80188be5ec00 x18: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.849792] x17: 0000000000000001 x16: 0000000000000007 Sep 29 17:42:37 k8snode244 kernel: [952618.849794] x15: 0000000000000000 x14: 0000000000002400 Sep 29 17:42:37 k8snode244 kernel: [952618.849796] x13: 0000000000000000 x12: ffff000011ba7000 Sep 29 17:42:37 k8snode244 kernel: [952618.849798] x11: 00000000000794fc x10: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.849800] x9 : 0000000000000001 x8 : 000000000001a923 Sep 29 17:42:37 k8snode244 kernel: [952618.849802] x7 : ffff000011ba7000 x6 : 00002b369e3d3bcd Sep 29 17:42:37 k8snode244 kernel: [952618.849804] x5 : 0000000000000001 x4 : ffff8017dbbc2248 Sep 29 17:42:37 k8snode244 kernel: [952618.849806] x3 : ffff8017dbbc2248 x2 : b5be5ef7ef51a000 Sep 29 17:42:37 k8snode244 kernel: [952618.849808] x1 : 0000000000000000 x0 : 0000000000000024 Sep 29 17:42:37 k8snode244 kernel: [952618.849810] Call trace: Sep 29 17:42:37 k8snode244 kernel: [952618.849813] drm_sched_resubmit_jobs+0x188/0x1a8 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.849894] amdgpu_device_gpu_recover+0x460/0xb58 [amdgpu] Sep 29 17:42:37 k8snode244 kernel: [952618.849973] amdgpu_job_timedout+0xe4/0x108 [amdgpu] Sep 29 17:42:37 k8snode244 kernel: [952618.849977] drm_sched_job_timedout+0x84/0xf8 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.849980] process_one_work+0x1ec/0x470 Sep 29 17:42:37 k8snode244 kernel: [952618.849982] worker_thread+0x48/0x458 Sep 29 17:42:37 k8snode244 kernel: [952618.849984] kthread+0x110/0x118 Sep 29 17:42:37 k8snode244 kernel: [952618.849986] ret_from_fork+0x10/0x18 Sep 29 17:42:37 k8snode244 kernel: [952618.849988] ---[ end trace 5b779f1dd4a6e6d0 ]--- Sep 29 17:42:37 k8snode244 kernel: [952618.850034] [drm] Skip scheduling IBs! Sep 29 17:42:37 k8snode244 kernel: [952618.850036] ------------[ cut here ]------------ Sep 29 17:42:37 k8snode244 kernel: [952618.850061] WARNING: CPU: 14 PID: 903 at /home/lwj/build/kernel/include/linux/dma-fence.h:513 drm_sched_main+0x2d4/0x2e0 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.850063] Modules linked in: sch_tbf veth xt_recent br_netfilter bridge stp llc xt_addrtype bpfilter ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs overlay nls_iso8859_1 ipmi_ssif snd_hda_intel snd_hda_codec joydev snd_hda_core input_leds snd_hwdep snd_pcm snd_timer snd ipmi_si soundcore ipmi_devintf ipmi_msghandler tcp_bbr sch_fq binder_dkms(OE) autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 multipath linear ses enclosure hibmc_drm hid_generic usbhid hid marvell aes_ce_blk aes_ce_cipher amdgpu i2c_algo_bit crct10dif_ce gpu_sched drm_vram_helper ttm ghash_ce sha2_ce drm_kms_helper syscopyarea sha256_arm64 sysfillrect sysimgblt fb_sys_fops sha1_ce drm hisi_sas_v2_hw hisi_sas_main libsas ehci_platform scsi_transport_sas hns_dsaf hns_enet_drv hns_mdio hnae aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 Sep 29 17:42:37 k8snode244 kernel: [952618.850098] CPU: 14 PID: 903 Comm: gfx Kdump: loaded Tainted: G W OE 5.3.15-050315.2020063001-generic #appstream Sep 29 17:42:37 k8snode244 kernel: [952618.850099] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.63 09/19/2019 Sep 29 17:42:37 k8snode244 kernel: [952618.850101] pstate: 40400005 (nZcv daif +PAN -UAO) Sep 29 17:42:37 k8snode244 kernel: [952618.850104] pc : drm_sched_main+0x2d4/0x2e0 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.850107] lr : drm_sched_main+0x2d4/0x2e0 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.850108] sp : ffff00001c883dd0 Sep 29 17:42:37 k8snode244 kernel: [952618.850109] x29: ffff00001c883dd0 x28: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.850111] x27: 0000000000000000 x26: ffff801fd37a6bc8 Sep 29 17:42:37 k8snode244 kernel: [952618.850113] x25: ffff000011b79000 x24: ffff8019e70e0858 Sep 29 17:42:37 k8snode244 kernel: [952618.850115] x23: ffff801fd37a6b18 x22: ffff801f14a45000 Sep 29 17:42:37 k8snode244 kernel: [952618.850116] x21: 0000000000000000 x20: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.850118] x19: ffff801fd37a6a30 x18: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.850120] x17: 0000000000000001 x16: 0000000000000007 Sep 29 17:42:37 k8snode244 kernel: [952618.850122] x15: 0000000000000000 x14: 0000000000002400 Sep 29 17:42:37 k8snode244 kernel: [952618.850123] x13: 0000000000000000 x12: ffff000011ba7000 Sep 29 17:42:37 k8snode244 kernel: [952618.850125] x11: 000000000007a0e4 x10: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.850127] x9 : 0000000000000001 x8 : 000000000001a947 Sep 29 17:42:37 k8snode244 kernel: [952618.850129] x7 : ffff000011ba7000 x6 : 00002b369e3d3bcd Sep 29 17:42:37 k8snode244 kernel: [952618.850131] x5 : 0000000000000001 x4 : ffff8017dbbc2248 Sep 29 17:42:37 k8snode244 kernel: [952618.850133] x3 : ffff8017dbbc2248 x2 : b5be5ef7ef51a000 Sep 29 17:42:37 k8snode244 kernel: [952618.850135] x1 : 0000000000000000 x0 : 0000000000000024 Sep 29 17:42:37 k8snode244 kernel: [952618.850138] Call trace: Sep 29 17:42:37 k8snode244 kernel: [952618.850142] drm_sched_main+0x2d4/0x2e0 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.850144] kthread+0x110/0x118 Sep 29 17:42:37 k8snode244 kernel: [952618.850146] ret_from_fork+0x10/0x18 Sep 29 17:42:37 k8snode244 kernel: [952618.850149] ---[ end trace 5b779f1dd4a6e6d1 ]--- Sep 29 17:42:37 k8snode244 kernel: [952618.850168] [drm] Skip scheduling IBs! Sep 29 17:42:37 k8snode244 kernel: [952618.850170] ------------[ cut here ]------------ Sep 29 17:42:37 k8snode244 kernel: [952618.850190] WARNING: CPU: 14 PID: 903 at /home/lwj/build/kernel/include/linux/dma-fence.h:513 drm_sched_main+0x2d4/0x2e0 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.850191] Modules linked in: sch_tbf veth xt_recent br_netfilter bridge stp llc xt_addrtype bpfilter ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs overlay nls_iso8859_1 ipmi_ssif snd_hda_intel snd_hda_codec joydev snd_hda_core input_leds snd_hwdep snd_pcm snd_timer snd ipmi_si soundcore ipmi_devintf ipmi_msghandler tcp_bbr sch_fq binder_dkms(OE) autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 multipath linear ses enclosure hibmc_drm hid_generic usbhid hid marvell aes_ce_blk aes_ce_cipher amdgpu i2c_algo_bit crct10dif_ce gpu_sched drm_vram_helper ttm ghash_ce sha2_ce drm_kms_helper syscopyarea sha256_arm64 sysfillrect sysimgblt fb_sys_fops sha1_ce drm hisi_sas_v2_hw hisi_sas_main libsas ehci_platform scsi_transport_sas hns_dsaf hns_enet_drv hns_mdio hnae aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 Sep 29 17:42:37 k8snode244 kernel: [952618.850224] CPU: 14 PID: 903 Comm: gfx Kdump: loaded Tainted: G W OE 5.3.15-050315.2020063001-generic #appstream Sep 29 17:42:37 k8snode244 kernel: [952618.850226] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.63 09/19/2019 Sep 29 17:42:37 k8snode244 kernel: [952618.850228] pstate: 40400005 (nZcv daif +PAN -UAO) Sep 29 17:42:37 k8snode244 kernel: [952618.850231] pc : drm_sched_main+0x2d4/0x2e0 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.850234] lr : drm_sched_main+0x2d4/0x2e0 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.850235] sp : ffff00001c883dd0 Sep 29 17:42:37 k8snode244 kernel: [952618.850237] x29: ffff00001c883dd0 x28: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.850239] x27: 0000000000000000 x26: ffff801fd37a6bc8 Sep 29 17:42:37 k8snode244 kernel: [952618.850241] x25: ffff000011b79000 x24: ffff8017d1ae2c58 Sep 29 17:42:37 k8snode244 kernel: [952618.850243] x23: ffff801fd37a6b18 x22: ffff8011c9bcea00 Sep 29 17:42:37 k8snode244 kernel: [952618.850245] x21: 0000000000000000 x20: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.850247] x19: ffff801fd37a6a30 x18: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.850248] x17: 0000000000000001 x16: 0000000000000007 Sep 29 17:42:37 k8snode244 kernel: [952618.850250] x15: 0000000000000000 x14: 0000000000002400 Sep 29 17:42:37 k8snode244 kernel: [952618.850252] x13: 0000000000000000 x12: ffff000011ba7000 Sep 29 17:42:37 k8snode244 kernel: [952618.850253] x11: 000000000007ab84 x10: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.850255] x9 : 0000000000000001 x8 : 000000000001a965 Sep 29 17:42:37 k8snode244 kernel: [952618.850257] x7 : ffff000011ba7000 x6 : 00002b369e3d3bcd Sep 29 17:42:37 k8snode244 kernel: [952618.850259] x5 : 0000000000000001 x4 : ffff8017dbbc2248 Sep 29 17:42:37 k8snode244 kernel: [952618.850261] x3 : ffff8017dbbc2248 x2 : b5be5ef7ef51a000 Sep 29 17:42:37 k8snode244 kernel: [952618.850263] x1 : 0000000000000000 x0 : 0000000000000024 Sep 29 17:42:37 k8snode244 kernel: [952618.850265] Call trace: Sep 29 17:42:37 k8snode244 kernel: [952618.850269] drm_sched_main+0x2d4/0x2e0 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.850271] kthread+0x110/0x118 Sep 29 17:42:37 k8snode244 kernel: [952618.850274] ret_from_fork+0x10/0x18 Sep 29 17:42:37 k8snode244 kernel: [952618.850275] ---[ end trace 5b779f1dd4a6e6d2 ]--- Sep 29 17:42:37 k8snode244 kernel: [952618.850298] [drm] Skip scheduling IBs! Sep 29 17:42:37 k8snode244 kernel: [952618.850299] ------------[ cut here ]------------ Sep 29 17:42:37 k8snode244 kernel: [952618.850320] WARNING: CPU: 14 PID: 903 at /home/lwj/build/kernel/include/linux/dma-fence.h:513 drm_sched_main+0x2d4/0x2e0 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.850321] Modules linked in: sch_tbf veth xt_recent br_netfilter bridge stp llc xt_addrtype bpfilter ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs overlay nls_iso8859_1 ipmi_ssif snd_hda_intel snd_hda_codec joydev snd_hda_core input_leds snd_hwdep snd_pcm snd_timer snd ipmi_si soundcore ipmi_devintf ipmi_msghandler tcp_bbr sch_fq binder_dkms(OE) autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 multipath linear ses enclosure hibmc_drm hid_generic usbhid hid marvell aes_ce_blk aes_ce_cipher amdgpu i2c_algo_bit crct10dif_ce gpu_sched drm_vram_helper ttm ghash_ce sha2_ce drm_kms_helper syscopyarea sha256_arm64 sysfillrect sysimgblt fb_sys_fops sha1_ce drm hisi_sas_v2_hw hisi_sas_main libsas ehci_platform scsi_transport_sas hns_dsaf hns_enet_drv hns_mdio hnae aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 Sep 29 17:42:37 k8snode244 kernel: [952618.850355] CPU: 14 PID: 903 Comm: gfx Kdump: loaded Tainted: G W OE 5.3.15-050315.2020063001-generic #appstream Sep 29 17:42:37 k8snode244 kernel: [952618.850356] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.63 09/19/2019 Sep 29 17:42:37 k8snode244 kernel: [952618.850358] pstate: 40400005 (nZcv daif +PAN -UAO) Sep 29 17:42:37 k8snode244 kernel: [952618.850361] pc : drm_sched_main+0x2d4/0x2e0 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.850364] lr : drm_sched_main+0x2d4/0x2e0 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.850366] sp : ffff00001c883dd0 Sep 29 17:42:37 k8snode244 kernel: [952618.850367] x29: ffff00001c883dd0 x28: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.850369] x27: 0000000000000000 x26: ffff801fd37a6bc8 Sep 29 17:42:37 k8snode244 kernel: [952618.850371] x25: ffff000011b79000 x24: ffff80196cad9c58 Sep 29 17:42:37 k8snode244 kernel: [952618.850373] x23: ffff801fd37a6b18 x22: ffff8017cb683900 Sep 29 17:42:37 k8snode244 kernel: [952618.850375] x21: 0000000000000000 x20: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.850377] x19: ffff801fd37a6a30 x18: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.850379] x17: 0000000000000001 x16: 0000000000000007 Sep 29 17:42:37 k8snode244 kernel: [952618.850380] x15: 0000000000000000 x14: 0000000000002400 Sep 29 17:42:37 k8snode244 kernel: [952618.850382] x13: 0000000000000000 x12: ffff000011ba7000 Sep 29 17:42:37 k8snode244 kernel: [952618.850384] x11: 000000000007b5d0 x10: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.850386] x9 : 0000000000000001 x8 : 000000000001a983 Sep 29 17:42:37 k8snode244 kernel: [952618.850388] x7 : ffff000011ba7000 x6 : 00002b369e3d3bcd Sep 29 17:42:37 k8snode244 kernel: [952618.850390] x5 : 0000000000000001 x4 : ffff8017dbbc2248 Sep 29 17:42:37 k8snode244 kernel: [952618.850392] x3 : ffff8017dbbc2248 x2 : b5be5ef7ef51a000 Sep 29 17:42:37 k8snode244 kernel: [952618.850395] x1 : 0000000000000000 x0 : 0000000000000024 Sep 29 17:42:37 k8snode244 kernel: [952618.850397] Call trace: Sep 29 17:42:37 k8snode244 kernel: [952618.850400] drm_sched_main+0x2d4/0x2e0 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.850403] kthread+0x110/0x118 Sep 29 17:42:37 k8snode244 kernel: [952618.850405] ret_from_fork+0x10/0x18 Sep 29 17:42:37 k8snode244 kernel: [952618.850406] ---[ end trace 5b779f1dd4a6e6d3 ]--- Sep 29 17:42:37 k8snode244 kernel: [952618.850461] [drm] Skip scheduling IBs! Sep 29 17:42:37 k8snode244 kernel: [952618.850463] ------------[ cut here ]------------ Sep 29 17:42:37 k8snode244 kernel: [952618.850483] WARNING: CPU: 14 PID: 903 at /home/lwj/build/kernel/include/linux/dma-fence.h:513 drm_sched_main+0x2d4/0x2e0 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.850484] Modules linked in: sch_tbf veth xt_recent br_netfilter bridge stp llc xt_addrtype bpfilter ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs overlay nls_iso8859_1 ipmi_ssif snd_hda_intel snd_hda_codec joydev snd_hda_core input_leds snd_hwdep snd_pcm snd_timer snd ipmi_si soundcore ipmi_devintf ipmi_msghandler tcp_bbr sch_fq binder_dkms(OE) autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 multipath linear ses enclosure hibmc_drm hid_generic usbhid hid marvell aes_ce_blk aes_ce_cipher amdgpu i2c_algo_bit crct10dif_ce gpu_sched drm_vram_helper ttm ghash_ce sha2_ce drm_kms_helper syscopyarea sha256_arm64 sysfillrect sysimgblt fb_sys_fops sha1_ce drm hisi_sas_v2_hw hisi_sas_main libsas ehci_platform scsi_transport_sas hns_dsaf hns_enet_drv hns_mdio hnae aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 Sep 29 17:42:37 k8snode244 kernel: [952618.850519] CPU: 14 PID: 903 Comm: gfx Kdump: loaded Tainted: G W OE 5.3.15-050315.2020063001-generic #appstream Sep 29 17:42:37 k8snode244 kernel: [952618.850520] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.63 09/19/2019 Sep 29 17:42:37 k8snode244 kernel: [952618.850522] pstate: 40400005 (nZcv daif +PAN -UAO) Sep 29 17:42:37 k8snode244 kernel: [952618.850525] pc : drm_sched_main+0x2d4/0x2e0 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.850528] lr : drm_sched_main+0x2d4/0x2e0 [gpu_sched] Sep 29 17:42:37 k8snode244 kernel: [952618.850529] sp : ffff00001c883dd0 Sep 29 17:42:37 k8snode244 kernel: [952618.850530] x29: ffff00001c883dd0 x28: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.850531] x27: 0000000000000000 x26: ffff801fd37a6bc8 Sep 29 17:42:37 k8snode244 kernel: [952618.850533] x25: ffff000011b79000 x24: ffff808ca5534458 Sep 29 17:42:37 k8snode244 kernel: [952618.850535] x23: ffff801fd37a6b18 x22: ffff80161a411200 Sep 29 17:42:37 k8snode244 kernel: [952618.850536] x21: 0000000000000000 x20: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.850538] x19: ffff801fd37a6a30 x18: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.850540] x17: 0000000000000001 x16: 0000000000000007 Sep 29 17:42:37 k8snode244 kernel: [952618.850541] x15: 0000000000000000 x14: 0000000000002400 Sep 29 17:42:37 k8snode244 kernel: [952618.850543] x13: 0000000000000000 x12: ffff000011ba7000 Sep 29 17:42:37 k8snode244 kernel: [952618.850546] x11: 000000000007c018 x10: 0000000000000000 Sep 29 17:42:37 k8snode244 kernel: [952618.850548] x9 : 0000000000000001 x8 : 000000000001a9a1 Sep 29 17:42:37 k8snode244 kernel: [952618.850550] x7 : ffff000011ba7000 x6 : 00002b369e3d3bcd Sep 29 17:42:37 k8snode244 kernel: [952618.850552] x5 : 0000000000000001 x4 : ffff8017dbbc2248 Sep 29 17:42:37 k8snode244 kernel: [952618.850553] x3 : ffff8017dbbc2248 x2 : b5be5ef7ef51a000 Sep 29 17:42:37 k8snode244 kernel: [952618.850555] x1 : 0000000000000000 x0 : 0000000000000024 +++++++++++keylog++++++++++++++++++ Sep 29 17:42:35 k8snode244 kernel: [952617.235578] amdgpu 0005:01:00.0: GPU reset begin! Sep 29 17:42:35 k8snode244 kernel: [952617.236276] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]]*ERROR* suspend of IP block <vce_v3_0> failed -22 Sep 29 17:42:36 k8snode244 kernel: [952617.842417] amdgpu 0005:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]*ERROR* ring kiq_2.1.0 test failed (-110) Sep 29 17:42:36 k8snode244 kernel: [952617.842500] [drm:gfx_v8_0_hw_fini [amdgpu]]*ERROR* KCQ disable failed Sep 29 17:42:36 k8snode244 kernel: [952618.098569] cp is busy, skip halt cp Sep 29 17:42:36 k8snode244 kernel: [952618.356730] rlc is busy, skip halt rlc Sep 29 17:42:36 k8snode244 kernel: [952618.357783] amdgpu 0005:01:00.0: GPU pci config reset Sep 29 17:42:36 k8snode244 kernel: [952618.476296] amdgpu 0005:01:00.0: GPU reset succeeded, trying to resume _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx