Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> Andrey On 1/14/21 8:37 AM, Horace Chen wrote:
If 2 jobs on 2 different ring timed out the at a very short period, the reset for second job will be skipped because the reset is already in progress. But it doesn't mean the second job is not guilty since it also timed out and can be a bad job. So before skipped out from the reset, we need to increase karma for this job too. Signed-off-by: Horace Chen <horace.chen@xxxxxxx> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index a28e138ac72c..d1112e29c8b4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4572,6 +4572,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, if (atomic_cmpxchg(&hive->in_reset, 0, 1) != 0) { DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", job ? job->base.id : -1, hive->hive_id); + if(job) + drm_sched_increase_karma(&job->base); amdgpu_put_xgmi_hive(hive); return 0; } @@ -4596,6 +4598,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, dev_info(adev->dev, "Bailing on TDR for s_job:%llx, as another already in progress", job ? job->base.id : -1); r = 0; + if(job) + drm_sched_increase_karma(&job->base); goto skip_recovery; }
_______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx