On 9/25/19 12:07 PM, Andrey Grodzovsky wrote: > On 9/25/19 12:00 PM, Steven Price wrote: > >> On 25/09/2019 16:56, Grodzovsky, Andrey wrote: >>> On 9/25/19 11:14 AM, Steven Price wrote: >>> >>>> drm_sched_cleanup_jobs() attempts to free finished jobs, however >>>> because >>>> it is called as the condition of wait_event_interruptible() it must >>>> not >>>> sleep. Unfortunately some free callbacks (notably for Panfrost) do >>>> sleep. >>>> >>>> Instead let's rename drm_sched_cleanup_jobs() to >>>> drm_sched_get_cleanup_job() and simply return a job for processing if >>>> there is one. The caller can then call the free_job() callback outside >>>> the wait_event_interruptible() where sleeping is possible before >>>> re-checking and returning to sleep if necessary. >>>> >>>> Signed-off-by: Steven Price <steven.price@xxxxxxx> >>>> --- >>>> drivers/gpu/drm/scheduler/sched_main.c | 44 >>>> ++++++++++++++------------ >>>> 1 file changed, 24 insertions(+), 20 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>>> b/drivers/gpu/drm/scheduler/sched_main.c >>>> index 9a0ee74d82dc..0ed4aaa4e6d1 100644 >>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct >>>> dma_fence *f, struct dma_fence_cb *cb) >>>> } >>>> /** >>>> - * drm_sched_cleanup_jobs - destroy finished jobs >>>> + * drm_sched_get_cleanup_job - fetch the next finished job to be >>>> destroyed >>>> * >>>> * @sched: scheduler instance >>>> * >>>> - * Remove all finished jobs from the mirror list and destroy them. >>>> + * Returns the next finished job from the mirror list (if there is >>>> one) >>>> + * ready for it to be destroyed. >>>> */ >>>> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) >>>> +static struct drm_sched_job * >>>> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) >>>> { >>>> + struct drm_sched_job *job = NULL; >>>> unsigned long flags; >>>> /* Don't destroy jobs while the timeout worker is running */ >>>> if (sched->timeout != MAX_SCHEDULE_TIMEOUT && >>>> !cancel_delayed_work(&sched->work_tdr)) >>>> - return; >>>> - >>>> - >>>> - while (!list_empty(&sched->ring_mirror_list)) { >>>> - struct drm_sched_job *job; >>>> + return NULL; >>>> - job = list_first_entry(&sched->ring_mirror_list, >>>> + job = list_first_entry_or_null(&sched->ring_mirror_list, >>>> struct drm_sched_job, node); >>>> - if (!dma_fence_is_signaled(&job->s_fence->finished)) >>>> - break; >>>> - spin_lock_irqsave(&sched->job_list_lock, flags); >>>> + spin_lock_irqsave(&sched->job_list_lock, flags); >>>> + >>>> + if (job && dma_fence_is_signaled(&job->s_fence->finished)) { >>>> /* remove job from ring_mirror_list */ >>>> list_del_init(&job->node); >>>> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >>>> - >>>> - sched->ops->free_job(job); >>>> + } else { >>>> + job = NULL; >>>> + /* queue timeout for next job */ >>>> + drm_sched_start_timeout(sched); >>>> } >>>> - /* queue timeout for next job */ >>>> - spin_lock_irqsave(&sched->job_list_lock, flags); >>>> - drm_sched_start_timeout(sched); >>>> spin_unlock_irqrestore(&sched->job_list_lock, flags); >>>> + return job; >>>> } >>>> /** >>>> @@ -698,12 +696,18 @@ static int drm_sched_main(void *param) >>>> struct drm_sched_fence *s_fence; >>>> struct drm_sched_job *sched_job; >>>> struct dma_fence *fence; >>>> + struct drm_sched_job *cleanup_job = NULL; >>>> wait_event_interruptible(sched->wake_up_worker, >>>> - (drm_sched_cleanup_jobs(sched), >>>> + (cleanup_job = >>>> drm_sched_get_cleanup_job(sched)) || >>>> (!drm_sched_blocked(sched) && >>>> (entity = drm_sched_select_entity(sched))) || >>>> - kthread_should_stop())); >>>> + kthread_should_stop()); >>> >>> Can't we just call drm_sched_cleanup_jobs right here, remove all the >>> conditions in wait_event_interruptible (make it always true) and after >>> drm_sched_cleanup_jobs is called test for all those conditions and >>> return to sleep if they evaluate to false ? drm_sched_cleanup_jobs is >>> called unconditionally inside wait_event_interruptible anyway... >>> This is >>> more of a question to Christian. >> Christian may know better than me, but I think those conditions need to >> be in wait_event_interruptible() to avoid race conditions. If we simply >> replace all the conditions with a literal "true" then >> wait_event_interruptible() will never actually sleep. >> >> Steve > > Yes you right, it won't work as I missed that condition is evaluated > as first step in wait_event_interruptible before it sleeps. > > Andrey Another idea - what about still just relocating drm_sched_cleanup_jobs to after wait_event_interruptible and also call it in drm_sched_fini so for the case when it will not be called from drm_sched_main due to conditions not evaluating to true it eventually be called last time from drm_sched_fini. I mean - the refactor looks ok to me but the code becomes somewhat confusing this way to grasp. Andrey > > _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel