Am 16.04.19 um 17:42 schrieb Grodzovsky, Andrey: > On 4/16/19 10:58 AM, Grodzovsky, Andrey wrote: >> On 4/16/19 10:43 AM, Koenig, Christian wrote: >>> Am 16.04.19 um 16:36 schrieb Grodzovsky, Andrey: >>>> On 4/16/19 5:47 AM, Christian König wrote: >>>>> Am 15.04.19 um 23:17 schrieb Eric Anholt: >>>>>> Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> writes: >>>>>> >>>>>>> From: Christian König <christian.koenig@xxxxxxx> >>>>>>> >>>>>>> We now destroy finished jobs from the worker thread to make sure that >>>>>>> we never destroy a job currently in timeout processing. >>>>>>> By this we avoid holding lock around ring mirror list in drm_sched_stop >>>>>>> which should solve a deadlock reported by a user. >>>>>>> >>>>>>> v2: Remove unused variable. >>>>>>> >>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 >>>>>>> >>>>>>> Signed-off-by: Christian König <christian.koenig@xxxxxxx> >>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> >>>>>>> --- >>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 ++-- >>>>>>> drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 - >>>>>>> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 9 +- >>>>>>> drivers/gpu/drm/scheduler/sched_main.c | 138 >>>>>>> +++++++++++++++++------------ >>>>>>> drivers/gpu/drm/v3d/v3d_sched.c | 9 +- >>>>>> Missing corresponding panfrost and lima updates. You should probably >>>>>> pull in drm-misc for hacking on the scheduler. >>>>>> >>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c >>>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c >>>>>>> index ce7c737b..8efb091 100644 >>>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c >>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c >>>>>>> @@ -232,11 +232,18 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, >>>>>>> struct drm_sched_job *sched_job) >>>>>>> /* block scheduler */ >>>>>>> for (q = 0; q < V3D_MAX_QUEUES; q++) >>>>>>> - drm_sched_stop(&v3d->queue[q].sched); >>>>>>> + drm_sched_stop(&v3d->queue[q].sched, sched_job); >>>>>>> if(sched_job) >>>>>>> drm_sched_increase_karma(sched_job); >>>>>>> + /* >>>>>>> + * Guilty job did complete and hence needs to be manually removed >>>>>>> + * See drm_sched_stop doc. >>>>>>> + */ >>>>>>> + if (list_empty(&sched_job->node)) >>>>>>> + sched_job->sched->ops->free_job(sched_job); >>>>>> If the if (sched_job) is necessary up above, then this should clearly be >>>>>> under it. >>>>>> >>>>>> But, can we please have a core scheduler thing we call here instead of >>>>>> drivers all replicating it? >>>>> Yeah that's also something I noted before. >>>>> >>>>> Essential problem is that we remove finished jobs from the mirror list >>>>> and so need to destruct them because we otherwise leak them. >>>>> >>>>> Alternative approach here would be to keep the jobs on the ring mirror >>>>> list, but not submit them again. >>>>> >>>>> Regards, >>>>> Christian. >>>> I really prefer to avoid this, it means adding extra flag to sched_job >>>> to check in each iteration of the ring mirror list. >>> Mhm, why actually? We just need to check if the scheduler fence is signaled. >> OK, i see it's equivalent but this still en extra check for all the >> iterations. >> >>>> What about changing >>>> signature of drm_sched_backend_ops.timedout_job to return drm_sched_job* >>>> instead of void, this way we can return the guilty job back from the >>>> driver specific handler to the generic drm_sched_job_timedout and >>>> release it there. >>> Well the timeout handler already has the job, so returning it doesn't >>> make much sense. >>> >>> The problem is rather that the timeout handler doesn't know if it should >>> destroy the job or not. >> But the driver specific handler does, and actually returning back either >> the pointer to the job or null will give an indication of that. We can >> even return bool. >> >> Andrey > > Thinking a bit more about this - the way this check is done now "if > (list_empty(&sched_job->node)) then free the sched_job" actually makes > it possible to just move this as is from driver specific callbacks into > drm_sched_job_timeout without any other changes. Oh, well that sounds like a good idea off hand. Need to see the final code, but at least the best idea so far. Christian. > > Andrey > >>> Christian. >>> >>>> Andrey >>>> >>>>>>> + >>>>>>> /* get the GPU back into the init state */ >>>>>>> v3d_reset(v3d); >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx