Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> Andrey On 1/19/21 7:22 AM, Horace Chen wrote:
If 2 jobs on 2 different ring timed out the at a very short period, the reset for second job will be skipped because the reset is already in progress. But it doesn't mean the second job is not guilty since it also timed out and can be a bad job. So before skipped out from the reset, we need to increase karma for this job too. Signed-off-by: Horace Chen <horace.chen@xxxxxxx> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 9574da3abc32..1d6ff9fe37de 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4574,6 +4574,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", job ? job->base.id : -1, hive->hive_id); amdgpu_put_xgmi_hive(hive); + if (job) + drm_sched_increase_karma(&job->base); return 0; } mutex_lock(&hive->hive_lock); @@ -4617,6 +4619,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, job ? job->base.id : -1); r = 0; /* even we skipped this reset, still need to set the job to guilty */ + if (job) + drm_sched_increase_karma(&job->base); goto skip_recovery; }
_______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx