[Public] > -----Original Message----- > From: Cornwall, Jay <Jay.Cornwall@xxxxxxx> > Sent: Thursday, January 16, 2025 3:41 PM > To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Cornwall, Jay <Jay.Cornwall@xxxxxxx>; Kim, Jonathan > <Jonathan.Kim@xxxxxxx> > Subject: [PATCH] drm/amdkfd: Block per-queue reset when halt_if_hws_hang=1 > > The purpose of halt_if_hws_hang is to preserve GPU state for driver > debugging when queue preemption fails. Issuing per-queue reset may > kill wavefronts which caused the preemption failure. > > Signed-off-by: Jay Cornwall <jay.cornwall@xxxxxxx> > Cc: Jonathan Kim <Jonathan.Kim@xxxxxxx> Reviewed-by: Jonathan Kim <jonathan.kim@xxxxxxx> > --- > drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > index f157494bfdb1..195085079eb2 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > @@ -2327,9 +2327,9 @@ static int unmap_queues_cpsch(struct > device_queue_manager *dqm, > */ > mqd_mgr = dqm->mqd_mgrs[KFD_MQD_TYPE_HIQ]; > if (mqd_mgr->check_preemption_failed(mqd_mgr, dqm- > >packet_mgr.priv_queue->queue->mqd)) { > + while (halt_if_hws_hang) > + schedule(); > if (reset_queues_on_hws_hang(dqm)) { > - while (halt_if_hws_hang) > - schedule(); > dqm->is_hws_hang = true; > kfd_hws_hang(dqm); > retval = -ETIME; > -- > 2.34.1