RE: [PATCH] drm/amdkfd: Block per-queue reset when halt_if_hws_hang=1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Public]

> -----Original Message-----
> From: Cornwall, Jay <Jay.Cornwall@xxxxxxx>
> Sent: Thursday, January 16, 2025 3:41 PM
> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: Cornwall, Jay <Jay.Cornwall@xxxxxxx>; Kim, Jonathan
> <Jonathan.Kim@xxxxxxx>
> Subject: [PATCH] drm/amdkfd: Block per-queue reset when halt_if_hws_hang=1
>
> The purpose of halt_if_hws_hang is to preserve GPU state for driver
> debugging when queue preemption fails. Issuing per-queue reset may
> kill wavefronts which caused the preemption failure.
>
> Signed-off-by: Jay Cornwall <jay.cornwall@xxxxxxx>
> Cc: Jonathan Kim <Jonathan.Kim@xxxxxxx>

Reviewed-by: Jonathan Kim <jonathan.kim@xxxxxxx>

> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index f157494bfdb1..195085079eb2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -2327,9 +2327,9 @@ static int unmap_queues_cpsch(struct
> device_queue_manager *dqm,
>        */
>       mqd_mgr = dqm->mqd_mgrs[KFD_MQD_TYPE_HIQ];
>       if (mqd_mgr->check_preemption_failed(mqd_mgr, dqm-
> >packet_mgr.priv_queue->queue->mqd)) {
> +             while (halt_if_hws_hang)
> +                     schedule();
>               if (reset_queues_on_hws_hang(dqm)) {
> -                     while (halt_if_hws_hang)
> -                             schedule();
>                       dqm->is_hws_hang = true;
>                       kfd_hws_hang(dqm);
>                       retval = -ETIME;
> --
> 2.34.1





[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux