NAK, you change break SRIOV logic: Without lockup_timeout set, this gpu_recover() won't get called at all , unless your IB triggered invalid instruct and that IRQ invoked Amdgpu_gpu_recover(), by this cause you should disable the logic that in that IRQ instead of change gpu_recover() itself because For SRIOV we need gpu_recover() even lockup_timeout is zero -----Original Message----- From: amd-gfx [mailto:amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx] On Behalf Of Marek Ol?ák Sent: 2017å¹´12æ??12æ?¥ 5:30 To: amd-gfx at lists.freedesktop.org Subject: [PATCH] amdgpu: disable GPU reset if amdgpu.lockup_timeout=0 From: Marek Olšák <marek.olsak@xxxxxxx> Signed-off-by: Marek Olšák <marek.olsak at amd.com> --- Is this really correct? I have no easy way to test it. drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 8d03baa..56c41cf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3018,20 +3018,24 @@ static int amdgpu_reset_sriov(struct amdgpu_device *adev, uint64_t *reset_flags, * * Attempt to reset the GPU if it has hung (all asics). * Returns 0 for success or an error on failure. */ int amdgpu_gpu_recover(struct amdgpu_device *adev, struct amdgpu_job *job) { struct drm_atomic_state *state = NULL; uint64_t reset_flags = 0; int i, r, resched; + /* amdgpu.lockup_timeout=0 disables GPU reset. */ + if (amdgpu_lockup_timeout == 0) + return 0; + if (!amdgpu_check_soft_reset(adev)) { DRM_INFO("No hardware hang detected. Did some blocks stall?\n"); return 0; } dev_info(adev->dev, "GPU reset begin!\n"); mutex_lock(&adev->lock_reset); atomic_inc(&adev->gpu_reset_counter); adev->in_gpu_reset = 1; -- 2.7.4 _______________________________________________ amd-gfx mailing list amd-gfx at lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx