Sriov should not call amdgpu_amdkfd_pre_reset inside amdgpu_device_lock_adev, either amdgpu_amdkfd_post_reset inside amdgpu_device_unlock_adev. In branch amd-staging-dkms-4.18, sriov already called amdgpu_amdkfd_pre_reset and amdgpu_amdkfd_post_reset inside amdgpu_device_reset_sriov. There 2 functions needed to be inside sriov's amdgpu_virt_request_full_gpu, or it would cause sriov hang. amdgpu_amdkfd_pre_reset inside amdgpu_device_lock_adev was duplicated for sriov, and caused cause sriov hang when entering amdgpu_device_lock_adev. That's the reason to add " if (!amdgpu_sriov_vf(adev))" based on branch amd-staging-dkms-4.18. BR, Wentao -----Original Message----- From: Liu, Shaoyun <Shaoyun.Liu@xxxxxxx> Sent: Tuesday, December 11, 2018 12:10 AM To: Lou, Wentao <Wentao.Lou@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx>; Kuehling, Felix <Felix.Kuehling@xxxxxxx> Cc: Lou, Wentao <Wentao.Lou@xxxxxxx> Subject: RE: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang But KFD still need to be notified during reset , the pre_reset call to KFD will let KFD have a chance to suspend all the running process queues. Was the reset works normally on SRIOV before the refactor change for XGMI support ? We shouldn't change the logic . Regards shaoyun.liu -----Original Message----- From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of wentalou Sent: Friday, December 7, 2018 1:09 AM To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Lou, Wentao <Wentao.Lou@xxxxxxx> Subject: [PATCH] drm/amdgpu: kfd_pre_reset outside req_full_gpu cause sriov hang XGMI hive put kfd_pre_reset into amdgpu_device_lock_adev, but outside req_full_gpu of sriov. It would make sriov hang during reset. Change-Id: I5b3e2a42c77b3b9635419df4470d021df7be34d1 Signed-off-by: Wentao Lou <Wentao.Lou@xxxxxxx> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index ef36cc5..659dd40 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3474,14 +3474,16 @@ static void amdgpu_device_lock_adev(struct amdgpu_device *adev) mutex_lock(&adev->lock_reset); atomic_inc(&adev->gpu_reset_counter); adev->in_gpu_reset = 1; - /* Block kfd */ - amdgpu_amdkfd_pre_reset(adev); + /* Block kfd: SRIOV would do it separately */ + if (!amdgpu_sriov_vf(adev)) + amdgpu_amdkfd_pre_reset(adev); } static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) { - /*unlock kfd */ - amdgpu_amdkfd_post_reset(adev); + /*unlock kfd: SRIOV would do it separately */ + if (!amdgpu_sriov_vf(adev)) + amdgpu_amdkfd_post_reset(adev); amdgpu_vf_error_trans_all(adev); adev->in_gpu_reset = 0; mutex_unlock(&adev->lock_reset); -- 2.7.4 _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx