[AMD Official Use Only] Ye, a lot already been changed since then , now the pre_reset and post_reset not in the lock/unlock anymore. With my previous change , we make kfd_pre_reset avoid touch HW . Now it's pure SW handling , should be safe to be moved out of the full access . Anyway, thanks to bring this up, it will remind us to verify on the XGMI configuration on SRIOV. Regards shaoyun.liu -----Original Message----- From: Kuehling, Felix <Felix.Kuehling@xxxxxxx> Sent: Friday, November 5, 2021 1:48 PM To: Liu, Shaoyun <Shaoyun.Liu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Subject: Re: [PATCH] drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov There was a reason why pre_reset was done differently on SRIOV. However, the code has changed a lot since then. Is this concern still valid? > commit 7b184b006185215daf4e911f8de212964c99a514 > Author: wentalou <Wentao.Lou@xxxxxxx> > Date: Fri Dec 7 13:53:18 2018 +0800 > > drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang > > XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev, > but outside req_full_gpu of sriov. > It would make sriov hang during reset. > > Signed-off-by: Wentao Lou <Wentao.Lou@xxxxxxx> > Reviewed-by: Shaoyun Liu <Shaoyun.Liu@xxxxxxx> > Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> Regards, Felix On 2021-11-05 12:57 p.m., shaoyunl wrote: > The KFD pre_reset should be called before reset been executed, it will > hold the lock to prevent other rocm process to sent the packlage to > hiq during host execute the real reset on the HW > > Signed-off-by: shaoyunl <shaoyun.liu@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 95fec36e385e..d7c9dce17cad 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -4278,8 +4278,6 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, > if (r) > return r; > > - amdgpu_amdkfd_pre_reset(adev); > - > /* Resume IP prior to SMC */ > r = amdgpu_device_ip_reinit_early_sriov(adev); > if (r) > @@ -5015,8 +5013,7 @@ int amdgpu_device_gpu_recover(struct > amdgpu_device *adev, > > cancel_delayed_work_sync(&tmp_adev->delayed_init_work); > > - if (!amdgpu_sriov_vf(tmp_adev)) > - amdgpu_amdkfd_pre_reset(tmp_adev); > + amdgpu_amdkfd_pre_reset(tmp_adev); > > /* > * Mark these ASICs to be reseted as untracked first