The debugger for GFX9.4.1 uses kfd_suspend_all_processes to pause the compute pipe line so it can safely toggle the SQ's implicit wait on barrier setting during debug attach/detach to work around the wave exception s_barrier race condition. For mGPU setups, repeated calls to cancel all outstanding restore work can result in an assymetric permanent cancelling of the restored work from the debug device after it has toggled the HW work around settings. Instead of cancelling the outstanding restore work, just flush it as it will be properly evicted anyways by the current suspend call. Signed-off-by: Jonathan Kim <jonathan.kim@xxxxxxx> --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 1e3795e7e18d..55a4ddd35e12 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -2008,7 +2008,7 @@ void kfd_suspend_all_processes(void) WARN(debug_evictions, "Evicting all processes"); hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) { cancel_delayed_work_sync(&p->eviction_work); - cancel_delayed_work_sync(&p->restore_work); + flush_delayed_work(&p->restore_work); if (kfd_process_evict_queues(p, KFD_QUEUE_EVICTION_TRIGGER_SUSPEND)) pr_err("Failed to suspend process 0x%x\n", p->pasid); -- 2.25.1