Am 30.06.2016 um 09:09 schrieb Chunming Zhou: > Change-Id: If673e1708b6207d70a26f64067dc1b0b24e868e7 > Signed-off-by: Chunming Zhou <David1.Zhou at amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 47 +++++++++++++----------------- > 1 file changed, 20 insertions(+), 27 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 5c4691c..60b6dd0 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -1951,8 +1951,10 @@ int amdgpu_gpu_reset(struct amdgpu_device *adev) > continue; > > kthread_park(ring->sched.thread); > + amd_sched_hw_job_reset(&ring->sched); > } > - > + /* after all hw jobs are reset, hw fence is meanless, so force_completion */ > + amdgpu_fence_driver_force_completion(adev); > /* block TTM */ > resched = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev); Unrelated to this change, but I just noticed that we should probably block TTM before parking the scheduler. Otherwise we could end up with this call waiting for the TTM workqueue and the TTM workqueue waiting for the scheduler which is already blocked. > /* store modesetting */ > @@ -1994,33 +1996,24 @@ retry: > } > /* restore scratch */ > amdgpu_atombios_scratch_regs_restore(adev); > - if (0) { > - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { > - struct amdgpu_ring *ring = adev->rings[i]; > - if (!ring) > - continue; > - kthread_unpark(ring->sched.thread); > - amdgpu_ring_restore(ring, ring_sizes[i], ring_data[i]); > - ring_sizes[i] = 0; > - ring_data[i] = NULL; > - } > > - r = amdgpu_ib_ring_tests(adev); > - if (r) { > - dev_err(adev->dev, "ib ring test failed (%d).\n", r); > - if (saved) { > - saved = false; > - r = amdgpu_suspend(adev); > - goto retry; > - } > - } > - } else { > - amdgpu_fence_driver_force_completion(adev); > - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { > - if (adev->rings[i]) { > - kthread_unpark(adev->rings[i]->sched.thread); > - kfree(ring_data[i]); > - } > + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { > + struct amdgpu_ring *ring = adev->rings[i]; > + if (!ring) > + continue; > + amdgpu_ring_restore(ring, ring_sizes[i], ring_data[i]); > + kthread_unpark(ring->sched.thread); > + ring_sizes[i] = 0; > + ring_data[i] = NULL; > + } > + > + r = amdgpu_ib_ring_tests(adev); > + if (r) { > + dev_err(adev->dev, "ib ring test failed (%d).\n", r); > + if (saved) { > + saved = false; > + r = amdgpu_suspend(adev); > + goto retry; Is it intentional that this enabled the ring backup again? Additional to that we should probably still react gracefully to a failed GPU reset. Regards, Christian. > } > } >