Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Lucas - Ping on my question and also I attached this temporary solution for etnaviv to clarify my point. If that something acceptable for now at least i can do the same for v3d where it requires a bit more code changes.

Andrey

On 2/6/20 10:49 AM, Andrey Grodzovsky wrote:
Well a revert would break our driver.

The real solution is that somebody needs to sit down, gather ALL the requirements and then come up with a solution which is clean and works for everyone.

Christian.


I can to take on this as indeed our general design on this becomes more and more entangled as GPU reset scenarios grow in complexity (at least in AMD driver). Currently I am on a high priority internal task which should take me around a week or 2 to finish and after that I can get to it.

Regarding temporary solution  - I looked into v3d and etnaviv use cases and we in AMD actually face the same scenario where we decide to skip HW reset if the guilty job did finish by the time we are processing the timeout  (see amdgpu_device_gpu_recover and skip_hw_reset goto) - the difference is we always call drm_sched_stop/start irrespectively of whether we are going to actually HW reset or not (same as extend timeout). I wonder if something like this can be done also for ve3 and etnaviv ?

Andrey
>From c3fa87856608463f14dddb03346c31054f3137c9 Mon Sep 17 00:00:00 2001
From: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>
Date: Mon, 10 Feb 2020 11:44:39 -0500
Subject: drm/etnaviv: Always execute sched stop and start.

During job timeout always stop and restart the scheduler even
if no HW resetis taking place.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>
---
 drivers/gpu/drm/etnaviv/etnaviv_sched.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index 4e3e95d..270caa8 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -89,12 +89,17 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
 	u32 dma_addr;
 	int change;
 
+
+
+	/* block scheduler */
+	drm_sched_stop(&gpu->sched, sched_job);
+
 	/*
 	 * If the GPU managed to complete this jobs fence, the timout is
 	 * spurious. Bail out.
 	 */
 	if (dma_fence_is_signaled(submit->out_fence))
-		return;
+		goto skip_hw_reset;
 
 	/*
 	 * If the GPU is still making forward progress on the front-end (which
@@ -105,12 +110,9 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
 	change = dma_addr - gpu->hangcheck_dma_addr;
 	if (change < 0 || change > 16) {
 		gpu->hangcheck_dma_addr = dma_addr;
-		return;
+		goto skip_hw_reset;
 	}
 
-	/* block scheduler */
-	drm_sched_stop(&gpu->sched, sched_job);
-
 	if(sched_job)
 		drm_sched_increase_karma(sched_job);
 
@@ -120,6 +122,9 @@ static void etnaviv_sched_timedout_job(struct drm_sched_job *sched_job)
 
 	drm_sched_resubmit_jobs(&gpu->sched);
 
+
+skip_hw_reset:
+
 	/* restart scheduler after GPU is usable again */
 	drm_sched_start(&gpu->sched, true);
 }
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux