Am 11.12.2017 um 22:29 schrieb Marek Olšák: > From: Marek Olšák <marek.olsak at amd.com> > > Signed-off-by: Marek Olšák <marek.olsak at amd.com> > --- > > Is this really correct? I have no easy way to test it. It's a step in the right direction, but I would rather vote for something else: Instead of disabling the timeout by default we only disable the GPU reset/recovery. The idea is to add a new parameter amdgpu_gpu_recovery which makes amdgpu_gpu_recover only prints out an error and doesn't touch the GPU at all (on bare metal systems). Then we finally set the amdgpu_lockup_timeout to a non zero value by default. Andrey could you take care of this when you have time? Thanks, Christian. > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 8d03baa..56c41cf 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -3018,20 +3018,24 @@ static int amdgpu_reset_sriov(struct amdgpu_device *adev, uint64_t *reset_flags, > * > * Attempt to reset the GPU if it has hung (all asics). > * Returns 0 for success or an error on failure. > */ > int amdgpu_gpu_recover(struct amdgpu_device *adev, struct amdgpu_job *job) > { > struct drm_atomic_state *state = NULL; > uint64_t reset_flags = 0; > int i, r, resched; > > + /* amdgpu.lockup_timeout=0 disables GPU reset. */ > + if (amdgpu_lockup_timeout == 0) > + return 0; > + > if (!amdgpu_check_soft_reset(adev)) { > DRM_INFO("No hardware hang detected. Did some blocks stall?\n"); > return 0; > } > > dev_info(adev->dev, "GPU reset begin!\n"); > > mutex_lock(&adev->lock_reset); > atomic_inc(&adev->gpu_reset_counter); > adev->in_gpu_reset = 1;