On 2025-03-05 00:42, Yifan Zha wrote:
[Why] If reset is detected and kfd need to evict working queues, HWS moving queue will be failed. Then remaining queues are not evicted and in active state. After reset done, kfd uses HWS to termination remaining activated queues but HWS is resetted. So remove queue will be failed again. [How] Keep removing all queues even if HWS returns failed. It will not affect cpsch as it checks reset_domain->sem. Signed-off-by: Yifan Zha <Yifan.Zha@xxxxxxx> --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index f3f2fd6ee65c..b213a845bd5b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1223,7 +1223,6 @@ static int evict_process_queues_cpsch(struct device_queue_manager *dqm, if (retval) { dev_err(dev, "Failed to evict queue %d\n", q->properties.queue_id); - goto out;
Is every subsequent call to remove_queue_mes guaranteed to also return an error? If not, you need a way to make sure an error is returned if any queue failed to be removed even if the last queue succeeded.
Regards, Felix
} } }