When compute fence did signal, compute ring cannot detect hardware hang because its timeout value is set to be infinite by default. In SR-IOV and passthrough mode, if user does not declare custome timeout value for compute ring, then use gfx ring timeout value as default. So that when there is a ture hardware hang, compute ring can detect it. Change-Id: I794ec0868c6c0aad407749457260ecfee0617c10 Signed-off-by: Jesse Zhang <zhexi.zhang@xxxxxxx> --- drivers/gpu/drm/amd/amdgpu/soc15.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c index 7c7e9f5..6cd5548 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc15.c +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c @@ -687,6 +687,16 @@ int soc15_set_ip_blocks(struct amdgpu_device *adev) adev->rev_id = soc15_get_rev_id(adev); adev->nbio.funcs->detect_hw_virt(adev); + /* + * If running under SR-IOV or passthrough mode and user did not set + * custom value for compute ring timeout, set timeout to be the same + * as gfx ring timeout to avoid compute ring cannot detect an true + * hang. + */ + if ((amdgpu_sriov_vf(adev) || amdgpu_passthrough(adev)) && + (adev->compute_timeout == MAX_SCHEDULE_TIMEOUT)) + adev->compute_timeout = adev->gfx_timeout; + if (amdgpu_sriov_vf(adev)) adev->virt.ops = &xgpu_ai_virt_ops; -- 2.7.4 _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx