[AMD Official Use Only] Om, sounds reasonable Thanks Shaoyun.liu -----Original Message----- From: Kuehling, Felix <Felix.Kuehling@xxxxxxx> Sent: Monday, November 15, 2021 11:07 AM To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Liu, Shaoyun <Shaoyun.Liu@xxxxxxx> Subject: Re: [PATCH] drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again Am 2021-11-14 um 12:55 p.m. schrieb shaoyunl: > In SRIOV configuration, the reset may failed to bring asic back to > normal but stop cpsch already been called, the start_cpsch will not be > called since there is no resume in this case. When reset been triggered again, driver should avoid to do uninitialization again. > > Signed-off-by: shaoyunl <shaoyun.liu@xxxxxxx> If there is a possibility that stop_cpsch is called multiple times, I think the check for that should be at the start of the function. Something like: if (!dqm->sched_running) return 0; Regards, Felix > --- > drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > index 42b2cc999434..bcc8980d77e0 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > @@ -1228,12 +1228,14 @@ static int stop_cpsch(struct device_queue_manager *dqm) > if (!dqm->is_hws_hang) > unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0); > hanging = dqm->is_hws_hang || dqm->is_resetting; > - dqm->sched_running = false; > > - pm_release_ib(&dqm->packet_mgr); > + if (dqm->sched_running) { > + dqm->sched_running = false; > + pm_release_ib(&dqm->packet_mgr); > + kfd_gtt_sa_free(dqm->dev, dqm->fence_mem); > + pm_uninit(&dqm->packet_mgr, hanging); > + } > > - kfd_gtt_sa_free(dqm->dev, dqm->fence_mem); > - pm_uninit(&dqm->packet_mgr, hanging); > dqm_unlock(dqm); > > return 0;