On 12/12/2017 04:01 AM, Christian König wrote: > Am 11.12.2017 um 22:29 schrieb Marek Olšák: >> From: Marek Olšák <marek.olsak at amd.com> >> >> Signed-off-by: Marek Olšák <marek.olsak at amd.com> >> --- >> >> Is this really correct? I have no easy way to test it. > > It's a step in the right direction, but I would rather vote for > something else: > > Instead of disabling the timeout by default we only disable the GPU > reset/recovery. > > The idea is to add a new parameter amdgpu_gpu_recovery which makes > amdgpu_gpu_recover only prints out an error and doesn't touch the GPU > at all (on bare metal systems). > > Then we finally set the amdgpu_lockup_timeout to a non zero value by > default. > > Andrey could you take care of this when you have time? > > Thanks, > Christian. Sure. Thanks, Andrey > >> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++ >>  1 file changed, 4 insertions(+) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index 8d03baa..56c41cf 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3018,20 +3018,24 @@ static int amdgpu_reset_sriov(struct >> amdgpu_device *adev, uint64_t *reset_flags, >>   * >>   * Attempt to reset the GPU if it has hung (all asics). >>   * Returns 0 for success or an error on failure. >>   */ >>  int amdgpu_gpu_recover(struct amdgpu_device *adev, struct >> amdgpu_job *job) >>  { >>      struct drm_atomic_state *state = NULL; >>      uint64_t reset_flags = 0; >>      int i, r, resched; >>  +   /* amdgpu.lockup_timeout=0 disables GPU reset. */ >> +   if (amdgpu_lockup_timeout == 0) >> +       return 0; >> + >>      if (!amdgpu_check_soft_reset(adev)) { >>          DRM_INFO("No hardware hang detected. Did some blocks >> stall?\n"); >>          return 0; >>      } >>       dev_info(adev->dev, "GPU reset begin!\n"); >>       mutex_lock(&adev->lock_reset); >>      atomic_inc(&adev->gpu_reset_counter); >>      adev->in_gpu_reset = 1; >