Re: [PATCH v2] drm/amd/amdkfd: Evict all queues even HWS remove queue failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[AMD Official Use Only - AMD Internal Distribution Only]


Thanks Felix. Declaring err inside the if block is better. I have submitted patch, could you please help to review it?

Thanks.

 

Best regard,

Yifan Zha

 



From: Kuehling, Felix <Felix.Kuehling@xxxxxxx>
Sent: Saturday, March 8, 2025 5:00 AM
To: Zha, YiFan(Even) <Yifan.Zha@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx>
Cc: Chang, HaiJun <HaiJun.Chang@xxxxxxx>; Chen, Horace <Horace.Chen@xxxxxxx>; Yin, ZhenGuo (Chris) <ZhenGuo.Yin@xxxxxxx>
Subject: Re: [PATCH v2] drm/amd/amdkfd: Evict all queues even HWS remove queue failed
 

On 2025-03-07 03:53, Yifan Zha wrote:
> [Why]
> If reset is detected and kfd need to evict working queues, HWS moving queue will be failed.
> Then remaining queues are not evicted and in active state.
>
> After reset done, kfd uses HWS to termination remaining activated queues but HWS is resetted.
> So remove queue will be failed again.
>
> [How]
> Keep removing all queues even if HWS returns failed.
> It will not affect cpsch as it checks reset_domain->sem.
>
> v2: If any queue failed, evict queue returns error.
>
> Signed-off-by: Yifan Zha <Yifan.Zha@xxxxxxx>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index f3f2fd6ee65c..b647745ee0a5 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1189,7 +1189,7 @@ static int evict_process_queues_cpsch(struct device_queue_manager *dqm,
>        struct queue *q;
>        struct device *dev = dqm->dev->adev->dev;
>        struct kfd_process_device *pdd;
> -     int retval = 0;
> +     int retval, err = 0;

You should still initialize retval to 0, otherwise the function may
return an uninitialized value if there are no MES queues. err does not
need to be initialized because you're always assigning a value just
before checking it below.

In fact, you could declare err inside the if-block below, since it is
only needed in that scope. It is preferred to declare variables in a
more limited scope if they are not needed outside.

Regards,
   Felix


>  
>        dqm_lock(dqm);
>        if (qpd->evicted++ > 0) /* already evicted, do nothing */
> @@ -1219,11 +1219,11 @@ static int evict_process_queues_cpsch(struct device_queue_manager *dqm,
>                decrement_queue_count(dqm, qpd, q);
>  
>                if (dqm->dev->kfd->shared_resources.enable_mes) {
> -                     retval = remove_queue_mes(dqm, q, qpd);
> -                     if (retval) {
> +                     err = remove_queue_mes(dqm, q, qpd);
> +                     if (err) {
>                                dev_err(dev, "Failed to evict queue %d\n",
>                                        q->properties.queue_id);
> -                             goto out;
> +                             retval = err;
>                        }
>                }
>        }

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux