Re: [PATCH 2/2] drm/sched: fix timeout handling

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Mon, 8 Oct 2018 19:55:55 +0200

Am 08.10.2018 um 19:40 schrieb Nayan Deshmukh:
Hi Christian,

I am a bit confused with this patch. It will be better if you can
explain what these 3 functions are supposed to do. From my
understanding this is what they are supposed to do:

1. drm_sched_job_timedout: This function is called when a job has been
executing for more than "timeout" seconds. This function needs to
identify the wrong job and call the timedout_job function which will
notify the driver of that job and the gpu should be recovered to its
original state.

2. drm_sched_hw_job_reset: stop the scheduler if it contains the bad
job. ATM this function first removes all the callback and goes through
the ring_mirror_list afterward to find the bad job. I think it should
do it in the opposite order. Or my understanding of this function is
not fully correct.

That function actually has a couple of bugs itself, but it is irrelevant 
for the current patch.

3. drm_sched_job_recovery: recover jobs after a reset. It resubmits
the job to the hardware queue avoiding the guilty job.

That function is completely unrelated and only called after recovering 
from a hard reset.


Some other questions and suggestions:

1. We can avoid the race condition altogether if we shift the
canceling and rescheduling of the timedout work item to the
drm_sched_process_job().

That won't work. drm_sched_process_job() is called from interrupt 
context and can't sync with a timeout worker.

2. How does removing and adding the callback help with the race
condition? Moreover, the hardware might execute some jobs while we are
in this function leading to more races.

We need to make sure that the underlying hardware fence doesn't signal 
and triggers new processing while we are about to call the drivers 
timeout function to reset the hardware.

This is done by removing the callbacks in reverse order (e.g. newest to 
oldest).

If we find that we can't remove the callback because the hardware has 
actually continued and the fence has already signaled we add back the 
callback again in normal order (e.g. oldest to newest) starting from the 
job which was already  signaled.


3. What is the order in which the above 3 functions should be executed
by the hardware? I think the answer to this question might clear a lot
of my doubts.

If we can quickly recover from a problem only the 
drm_sched_job_timedout() should be execute.

The other two are for hard resets where we need to stop multiple 
scheduler instances and get them running again.

That the karma handling is mixed into that is rather unfortunate and 
actually quite buggy as well.

I should probably also clean that up.

Regards,
Christian.


Regards,
Nayan

On Mon, Oct 8, 2018 at 8:36 PM Christian König
<ckoenig.leichtzumerken@xxxxxxxxx> wrote:
We need to make sure that we don't race between job completion and
timeout.

Signed-off-by: Christian König <christian.koenig@xxxxxxx>
---
  drivers/gpu/drm/scheduler/sched_main.c | 28 +++++++++++++++++++++++++++-
  1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index bd7d11c47202..ad3c57c9fd21 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -248,14 +248,40 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
  static void drm_sched_job_timedout(struct work_struct *work)
  {
         struct drm_gpu_scheduler *sched;
+       struct drm_sched_fence *fence;
         struct drm_sched_job *job;
+       int r;

         sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
+
+       spin_lock(&sched->job_list_lock);
+       list_for_each_entry_reverse(job, &sched->ring_mirror_list, node) {
+               fence = job->s_fence;
+               if (!dma_fence_remove_callback(fence->parent, &fence->cb))
+                       goto already_signaled;
+       }
+
         job = list_first_entry_or_null(&sched->ring_mirror_list,
                                        struct drm_sched_job, node);
+       spin_unlock(&sched->job_list_lock);

         if (job)
-               job->sched->ops->timedout_job(job);
+               sched->ops->timedout_job(job);
+
+       spin_lock(&sched->job_list_lock);
+       list_for_each_entry(job, &sched->ring_mirror_list, node) {
+               fence = job->s_fence;
+               if (!fence->parent || !list_empty(&fence->cb.node))
+                       continue;
+
+               r = dma_fence_add_callback(fence->parent, &fence->cb,
+                                          drm_sched_process_job);
+               if (r)
+already_signaled:
+                       drm_sched_process_job(fence->parent, &fence->cb);
+
+       }
+       spin_unlock(&sched->job_list_lock);
  }

  /**
--
2.14.1

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel