Re: [PATCH v3] drm/amd/amdkfd: Evict all queues even HWS remove queue failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2025-03-09 23:01, Yifan Zha wrote:
> [Why]
> If reset is detected and kfd need to evict working queues, HWS moving queue will be failed.
> Then remaining queues are not evicted and in active state.
>
> After reset done, kfd uses HWS to termination remaining activated queues but HWS is resetted.
> So remove queue will be failed again.
>
> [How]
> Keep removing all queues even if HWS returns failed.
> It will not affect cpsch as it checks reset_domain->sem.
>
> v2: If any queue failed, evict queue returns error.
> v3: Declare err inside the if-block.
>
> Signed-off-by: Yifan Zha <Yifan.Zha@xxxxxxx>

Reviewed-by: Felix Kuehling <felix.kuehling@xxxxxxx>


> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 885e0e9cf21b..2ed003d3ff0e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1221,11 +1221,13 @@ static int evict_process_queues_cpsch(struct device_queue_manager *dqm,
>  		decrement_queue_count(dqm, qpd, q);
>  
>  		if (dqm->dev->kfd->shared_resources.enable_mes) {
> -			retval = remove_queue_mes(dqm, q, qpd);
> -			if (retval) {
> +			int err;
> +
> +			err = remove_queue_mes(dqm, q, qpd);
> +			if (err) {
>  				dev_err(dev, "Failed to evict queue %d\n",
>  					q->properties.queue_id);
> -				goto out;
> +				retval = err;
>  			}
>  		}
>  	}



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux