On 9/26/19 11:23 AM, Steven Price wrote: > On 26/09/2019 16:14, Grodzovsky, Andrey wrote: >> On 9/26/19 10:16 AM, Steven Price wrote: >>> drm_sched_cleanup_jobs() attempts to free finished jobs, however because >>> it is called as the condition of wait_event_interruptible() it must not >>> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep. >>> >>> Instead let's rename drm_sched_cleanup_jobs() to >>> drm_sched_get_cleanup_job() and simply return a job for processing if >>> there is one. The caller can then call the free_job() callback outside >>> the wait_event_interruptible() where sleeping is possible before >>> re-checking and returning to sleep if necessary. >>> >>> Signed-off-by: Steven Price <steven.price@xxxxxxx> >>> --- >>> Changes from v3: >>> * drm_sched_main() re-arms the timeout for the next job after calling >>> free_job() >>> >>> drivers/gpu/drm/scheduler/sched_main.c | 45 +++++++++++++++----------- >>> 1 file changed, 26 insertions(+), 19 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >>> index 9a0ee74d82dc..148468447ba9 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb) >>> } >>> >>> /** >>> - * drm_sched_cleanup_jobs - destroy finished jobs >>> + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed >>> * >>> * @sched: scheduler instance >>> * >>> - * Remove all finished jobs from the mirror list and destroy them. >>> + * Returns the next finished job from the mirror list (if there is one) >>> + * ready for it to be destroyed. >>> */ >>> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) >>> +static struct drm_sched_job * >>> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) >>> { >>> + struct drm_sched_job *job = NULL; >>> unsigned long flags; >>> >>> /* Don't destroy jobs while the timeout worker is running */ >>> if (sched->timeout != MAX_SCHEDULE_TIMEOUT && >>> !cancel_delayed_work(&sched->work_tdr)) >>> - return; >>> - >>> + return NULL; >>> >>> - while (!list_empty(&sched->ring_mirror_list)) { >>> - struct drm_sched_job *job; >>> + spin_lock_irqsave(&sched->job_list_lock, flags); >>> >>> - job = list_first_entry(&sched->ring_mirror_list, >>> + job = list_first_entry_or_null(&sched->ring_mirror_list, >>> struct drm_sched_job, node); >>> - if (!dma_fence_is_signaled(&job->s_fence->finished)) >>> - break; >>> >>> - spin_lock_irqsave(&sched->job_list_lock, flags); >>> + if (job && dma_fence_is_signaled(&job->s_fence->finished)) { >>> /* remove job from ring_mirror_list */ >>> list_del_init(&job->node); >>> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >>> - >>> - sched->ops->free_job(job); >>> + } else { >>> + job = NULL; >>> + /* queue timeout for next job */ >>> + drm_sched_start_timeout(sched); >>> } >>> >>> - /* queue timeout for next job */ >>> - spin_lock_irqsave(&sched->job_list_lock, flags); >>> - drm_sched_start_timeout(sched); >>> spin_unlock_irqrestore(&sched->job_list_lock, flags); >>> >>> + return job; >>> } >>> >>> /** >>> @@ -698,12 +696,21 @@ static int drm_sched_main(void *param) >>> struct drm_sched_fence *s_fence; >>> struct drm_sched_job *sched_job; >>> struct dma_fence *fence; >>> + struct drm_sched_job *cleanup_job = NULL; >>> >>> wait_event_interruptible(sched->wake_up_worker, >>> - (drm_sched_cleanup_jobs(sched), >>> + (cleanup_job = drm_sched_get_cleanup_job(sched)) || >>> (!drm_sched_blocked(sched) && >>> (entity = drm_sched_select_entity(sched))) || >>> - kthread_should_stop())); >>> + kthread_should_stop()); >>> + >>> + while (cleanup_job) { >>> + sched->ops->free_job(cleanup_job); >>> + /* queue timeout for next job */ >>> + drm_sched_start_timeout(sched); >>> + >>> + cleanup_job = drm_sched_get_cleanup_job(sched); >>> + } >> >> Why drm_sched_start_timeout is called both here and inside >> drm_sched_get_cleanup_job ? And also why call it multiple times in the >> loop instead of only once after the loop is done ? > Christian pointed out to be that the first thing > drm_sched_get_cleanup_job does is call cancel_delayed_work(), and if > that returns false then it bails out with a NULL return. So to actually > get another job (if one exists) the timeout has to be restarted. For this case where timeout work already in progress note that drm_sched_job_timedout restarts the timeout in it's end so it should be ok to restart the timeout unconditionally inside drm_sched_get_cleanup_job as it was done before. > > It's also necessary to restart the timeout in the case where the return > is NULL which is handled in the function itself. > > TBH I'm not sure whether this while loop is worth it - it may be better > to replace it with simply: > > if (cleanup_job) { > sched->ops->free_job(cleanup_job); > /* queue timeout for next job */ > drm_sched_start_timeout(sched); > } > > The outer loop would then handle the next call to > drm_sched_get_cleanup_job() as necessary. What outer loop ? Andrey > > Steve _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel