If a queue is being destroyed but causes a HWS hang on removal, the KFD may issue an unnecessary gpu reset if the destroyed queue can be fixed by a queue reset. This is because the queue has been removed from the KFD's queue list prior to the preemption action on destroy so the reset call will fail to match the HQD PQ reset information against the KFD's queue record to do the actual reset. To fix this, deactivate the queue prior to preemption since it's being destroyed anyways and remove the queue from the KFD's queue list after preemption. v2: early deactivate queue and delete queue from list later as-per description instead of destroy queue referencing hack. Signed-off-by: Jonathan Kim <jonathan.kim@xxxxxxx> --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 577d121cc6d1..6d5a632b95eb 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -2407,10 +2407,10 @@ static int destroy_queue_cpsch(struct device_queue_manager *dqm, pdd->sdma_past_activity_counter += sdma_val; } - list_del(&q->list); qpd->queue_count--; if (q->properties.is_active) { decrement_queue_count(dqm, qpd, q); + q->properties.is_active = false; if (!dqm->dev->kfd->shared_resources.enable_mes) { retval = execute_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0, @@ -2421,6 +2421,7 @@ static int destroy_queue_cpsch(struct device_queue_manager *dqm, retval = remove_queue_mes(dqm, q, qpd); } } + list_del(&q->list); /* * Unconditionally decrement this counter, regardless of the queue's -- 2.34.1