[PATCH] drm/scheduler: don't update last scheduled fence in TDR

ckoenig.leichtzumerken@xxxxxxxxx (Christian König) · Thu, 3 May 2018 14:09:59 +0200

That change is indeed fixing a problem. When drm_sched_job_recovery() is 
called s_job->entity should already be NULL.

And even when the process is killed we should still either re-submit the 
jobs or set the error.

BTW: There is another bug in that function: guilty_context needs to be 
kept alive over the course of a loop.

Going to submit a patch for that.

Regards,
Christian.

Am 26.04.2018 um 06:23 schrieb zhoucm1:
> NAK on it.
>
> First of all, without this patch, Does it cause any issue?
>
> second,
>
> entity->last_scheduled present the last submiting job.
>
> Your this change will break this meaning and don't work, e.g.
>
> 1. mirror list has jobA and jobB, assuming they are belonged to same 
> entity, then the entity->last_scheduled is jobB->finished.
>
> 2. when you do recovery, re-submit jobA first, if don't update 
> last_scheduled, then entity->last_scheduled still is jobB->finished.
>
> 3. killed this process, will call to drm_sched_entity_cleanup, will 
> register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB 
> isn't submitted at all.
>
>
> So the change isn't necessary at all.
>
>
> Regards,
>
> David Zhou
>
>
> On 2018å¹´04æ??26æ?¥ 11:47, Ding, Pixel wrote:
>> Hi Monk,
>>
>> Please review it. Thanks.
>>
>> â??
>> Sincerely Yours,
>> Pixel
>>
>>
>> On 2018/4/25, 4:39 PM, "Pixel Ding" <Pixel.Ding at amd.com> wrote:
>>
>> Â Â Â Â  The current sequence in scheduler thread is:
>> Â Â Â Â  1. update last sched fence
>> Â Â Â Â  2. job begin (adding to mirror list)
>> Â Â Â Â  3. job finish (remove from mirror list)
>> Â Â Â Â  4. back to 1
>> Â Â Â Â  Â Â Â Â  Since we update last sched prior to joining mirror list, 
>> the jobs
>> Â Â Â Â  in mirror list already pass the last sched fence. TDR just run
>> Â Â Â Â  the jobs in mirror list, so we should not update the last sched
>> Â Â Â Â  fences in TDR.
>> Â Â Â Â  Â Â Â Â  Signed-off-by: Pixel Ding <Pixel.Ding at amd.com>
>> Â Â Â Â  ---
>> Â Â Â Â Â  drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
>> Â Â Â Â Â  1 file changed, 3 deletions(-)
>> Â Â Â Â  Â Â Â Â  diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
>> b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>> Â Â Â Â  index 088ff2b..1f1dd70 100644
>> Â Â Â Â  --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>> Â Â Â Â  +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>> Â Â Â Â  @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct 
>> drm_gpu_scheduler *sched)
>> Â Â Â Â Â Â Â Â Â Â Â Â Â  fence = sched->ops->run_job(s_job);
>> Â Â Â Â Â Â Â Â Â Â Â Â Â  atomic_inc(&sched->hw_rq_count);
>> Â Â Â Â Â  Â Â Â Â  - dma_fence_put(s_job->entity->last_scheduled);
>> Â Â Â Â  -Â Â Â Â Â Â Â  s_job->entity->last_scheduled = 
>> dma_fence_get(&s_fence->finished);
>> Â Â Â Â  -
>> Â Â Â Â Â Â Â Â Â Â Â Â Â  if (fence) {
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  s_fence->parent = dma_fence_get(fence);
>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  r = dma_fence_add_callback(fence, &s_fence->cb,
>> Â Â Â Â  --
>> Â Â Â Â  2.7.4
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx