Re: [PATCH v3 1/5] drm/scheduler: rework job destruction

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Tue, 16 Apr 2019 11:47:37 +0200

Am 15.04.19 um 23:17 schrieb Eric Anholt:
Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> writes:

From: Christian König <christian.koenig@xxxxxxx>

We now destroy finished jobs from the worker thread to make sure that
we never destroy a job currently in timeout processing.
By this we avoid holding lock around ring mirror list in drm_sched_stop
which should solve a deadlock reported by a user.

v2: Remove unused variable.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692

Signed-off-by: Christian König <christian.koenig@xxxxxxx>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  17 ++--
  drivers/gpu/drm/etnaviv/etnaviv_dump.c     |   4 -
  drivers/gpu/drm/etnaviv/etnaviv_sched.c    |   9 +-
  drivers/gpu/drm/scheduler/sched_main.c     | 138 +++++++++++++++++------------
  drivers/gpu/drm/v3d/v3d_sched.c            |   9 +-
Missing corresponding panfrost and lima updates.  You should probably
pull in drm-misc for hacking on the scheduler.

diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index ce7c737b..8efb091 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -232,11 +232,18 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job)
  
  	/* block scheduler */
  	for (q = 0; q < V3D_MAX_QUEUES; q++)
-		drm_sched_stop(&v3d->queue[q].sched);
+		drm_sched_stop(&v3d->queue[q].sched, sched_job);
  
  	if(sched_job)
  		drm_sched_increase_karma(sched_job);
  
+	/*
+	 * Guilty job did complete and hence needs to be manually removed
+	 * See drm_sched_stop doc.
+	 */
+	if (list_empty(&sched_job->node))
+		sched_job->sched->ops->free_job(sched_job);
If the if (sched_job) is necessary up above, then this should clearly be
under it.

But, can we please have a core scheduler thing we call here instead of
drivers all replicating it?

Yeah that's also something I noted before.

Essential problem is that we remove finished jobs from the mirror list 
and so need to destruct them because we otherwise leak them.

Alternative approach here would be to keep the jobs on the ring mirror 
list, but not submit them again.

Regards,
Christian.


+
  	/* get the GPU back into the init state */
  	v3d_reset(v3d);
  

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx