On 2025-03-09 23:01, Yifan Zha wrote: > [Why] > If reset is detected and kfd need to evict working queues, HWS moving queue will be failed. > Then remaining queues are not evicted and in active state. > > After reset done, kfd uses HWS to termination remaining activated queues but HWS is resetted. > So remove queue will be failed again. > > [How] > Keep removing all queues even if HWS returns failed. > It will not affect cpsch as it checks reset_domain->sem. > > v2: If any queue failed, evict queue returns error. > v3: Declare err inside the if-block. > > Signed-off-by: Yifan Zha <Yifan.Zha@xxxxxxx> Reviewed-by: Felix Kuehling <felix.kuehling@xxxxxxx> > --- > drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > index 885e0e9cf21b..2ed003d3ff0e 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > @@ -1221,11 +1221,13 @@ static int evict_process_queues_cpsch(struct device_queue_manager *dqm, > decrement_queue_count(dqm, qpd, q); > > if (dqm->dev->kfd->shared_resources.enable_mes) { > - retval = remove_queue_mes(dqm, q, qpd); > - if (retval) { > + int err; > + > + err = remove_queue_mes(dqm, q, qpd); > + if (err) { > dev_err(dev, "Failed to evict queue %d\n", > q->properties.queue_id); > - goto out; > + retval = err; > } > } > }