If GPU is during a resetting cycle, writing to GPU can cause unpredictable protection fault, see below call trace. Disallow using kfd debugfs hang_hws to hang hws if GPU is resetting. [12808.234114] general protection fault: 0000 [#1] SMP NOPTI [12808.234119] CPU: 13 PID: 6334 Comm: tee Tainted: G OE 5.4.0-77-generic #86-Ubuntu [12808.234121] Hardware name: ASUS System Product Name/Pro WS WRX80E-SAGE SE WIFI, BIOS 0211 11/27/2020 [12808.234220] RIP: 0010:kq_submit_packet+0xd/0x50 [amdgpu] [12808.234222] Code: 8b 45 d0 48 c7 00 00 00 00 00 b8 f4 ff ff ff eb df 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 55 48 8b 17 48 8b 47 48 <48> 8b 52 08 48 89 e5 83 7a 20 08 74 14 8b 77 20 89 30 48 8b 47 10 [12808.234224] RSP: 0018:ffffb0bf4954bdc0 EFLAGS: 00010216 [12808.234226] RAX: ffffb0bf4a1a5a00 RBX: ffff99302895c0c8 RCX: 0000000000000000 [12808.234227] RDX: c3156d43d3a04949 RSI: 0000000000000055 RDI: ffff99302584c300 [12808.234228] RBP: ffffb0bf4954bdf8 R08: 0000000000000543 R09: ffffb0bf4a1a4230 [12808.234229] R10: 000000000000000a R11: f000000000000000 R12: 0000000000000000 [12808.234230] R13: ffff99302895c0d8 R14: 00007ffebb3d18f0 R15: 0000000000000005 [12808.234232] FS: 00007f0d822ef580(0000) GS:ffff99307d340000(0000) knlGS:0000000000000000 [12808.234233] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [12808.234234] CR2: 00007ffebb3d1908 CR3: 0000001efe1ec000 CR4: 0000000000340ee0 [12808.234235] Call Trace: [12808.234324] ? pm_debugfs_hang_hws+0x71/0xd0 [amdgpu] [12808.234408] kfd_debugfs_hang_hws+0x2e/0x50 [amdgpu] [12808.234494] kfd_debugfs_hang_hws_write+0xb6/0xc0 [amdgpu] [12808.234499] full_proxy_write+0x5c/0x90 [12808.234502] __vfs_write+0x1b/0x40 [12808.234504] vfs_write+0xb9/0x1a0 [12808.234506] ksys_write+0x67/0xe0 [12808.234508] __x64_sys_write+0x1a/0x20 [12808.234511] do_syscall_64+0x57/0x190 [12808.234514] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Oak Zeng <Oak.Zeng@xxxxxxx> --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 9e4a05e..fc77d03 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -1390,6 +1390,11 @@ int kfd_debugfs_hang_hws(struct kfd_dev *dev) return -EINVAL; } + if (dev->dqm->is_resetting) { + pr_err("HWS is already resetting, please wait for the current reset to finish\n"); + return -EBUSY; + } + r = pm_debugfs_hang_hws(&dev->dqm->packets); if (!r) r = dqm_debugfs_execute_queues(dev->dqm); -- 2.7.4 _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx