That change is indeed fixing a problem. When drm_sched_job_recovery() is called s_job->entity should already be NULL. And even when the process is killed we should still either re-submit the jobs or set the error. BTW: There is another bug in that function: guilty_context needs to be kept alive over the course of a loop. Going to submit a patch for that. Regards, Christian. Am 26.04.2018 um 06:23 schrieb zhoucm1: > NAK on it. > > First of all, without this patch, Does it cause any issue? > > second, > > entity->last_scheduled present the last submiting job. > > Your this change will break this meaning and don't work, e.g. > > 1. mirror list has jobA and jobB, assuming they are belonged to same > entity, then the entity->last_scheduled is jobB->finished. > > 2. when you do recovery, re-submit jobA first, if don't update > last_scheduled, then entity->last_scheduled still is jobB->finished. > > 3. killed this process, will call to drm_sched_entity_cleanup, will > register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB > isn't submitted at all. > > > So the change isn't necessary at all. > > > Regards, > > David Zhou > > > On 2018å¹´04æ??26æ?¥ 11:47, Ding, Pixel wrote: >> Hi Monk, >> >> Please review it. Thanks. >> >> â?? >> Sincerely Yours, >> Pixel >> >> >> On 2018/4/25, 4:39 PM, "Pixel Ding" <Pixel.Ding at amd.com> wrote: >> >> Â Â Â Â The current sequence in scheduler thread is: >> Â Â Â Â 1. update last sched fence >> Â Â Â Â 2. job begin (adding to mirror list) >> Â Â Â Â 3. job finish (remove from mirror list) >> Â Â Â Â 4. back to 1 >> Â Â Â Â Â Â Â Â Since we update last sched prior to joining mirror list, >> the jobs >> Â Â Â Â in mirror list already pass the last sched fence. TDR just run >> Â Â Â Â the jobs in mirror list, so we should not update the last sched >> Â Â Â Â fences in TDR. >> Â Â Â Â Â Â Â Â Signed-off-by: Pixel Ding <Pixel.Ding at amd.com> >> Â Â Â Â --- >> Â Â Â Â Â drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 --- >> Â Â Â Â Â 1 file changed, 3 deletions(-) >> Â Â Â Â Â Â Â Â diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c >> b/drivers/gpu/drm/scheduler/gpu_scheduler.c >> Â Â Â Â index 088ff2b..1f1dd70 100644 >> Â Â Â Â --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c >> Â Â Â Â +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c >> Â Â Â Â @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct >> drm_gpu_scheduler *sched) >> Â Â Â Â Â Â Â Â Â Â Â Â Â fence = sched->ops->run_job(s_job); >> Â Â Â Â Â Â Â Â Â Â Â Â Â atomic_inc(&sched->hw_rq_count); >> Â Â Â Â Â Â Â Â Â - dma_fence_put(s_job->entity->last_scheduled); >> Â Â Â Â -Â Â Â Â Â Â Â s_job->entity->last_scheduled = >> dma_fence_get(&s_fence->finished); >> Â Â Â Â - >> Â Â Â Â Â Â Â Â Â Â Â Â Â if (fence) { >> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â s_fence->parent = dma_fence_get(fence); >> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â r = dma_fence_add_callback(fence, &s_fence->cb, >> Â Â Â Â -- >> Â Â Â Â 2.7.4 >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > > _______________________________________________ > amd-gfx mailing list > amd-gfx at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx