Inlined: On 2022-08-22 16:09, Andrey Grodzovsky wrote: > Poblem: Given many entities competing for same rq on ^Problem > same scheduler an uncceptabliy long wait time for some ^unacceptably > jobs waiting stuck in rq before being picked up are > observed (seen using GPUVis). > The issue is due to Round Robin policy used by scheduler > to pick up the next entity for execution. Under stress > of many entities and long job queus within entity some ^queues > jobs could be stack for very long time in it's entity's > queue before being popped from the queue and executed > while for other entites with samller job queues a job ^entities; smaller > might execute ealier even though that job arrived later ^earlier > then the job in the long queue. > > Fix: > Add FIFO selection policy to entites in RQ, chose next enitity > on rq in such order that if job on one entity arrived > ealrier then job on another entity the first job will start > executing ealier regardless of the length of the entity's job > queue. > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> > Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li@xxxxxxx> > --- > drivers/gpu/drm/scheduler/sched_entity.c | 2 + > drivers/gpu/drm/scheduler/sched_main.c | 65 ++++++++++++++++++++++-- > include/drm/gpu_scheduler.h | 8 +++ > 3 files changed, 71 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c > index 6b25b2f4f5a3..3bb7f69306ef 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) > atomic_inc(entity->rq->sched->score); > WRITE_ONCE(entity->last_user, current->group_leader); > first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node); > + sched_job->submit_ts = ktime_get(); > + > > /* first job wakes up scheduler */ > if (first) { > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 68317d3a7a27..c123aa120d06 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -59,6 +59,19 @@ > #define CREATE_TRACE_POINTS > #include "gpu_scheduler_trace.h" > > + > + > +int drm_sched_policy = -1; > + > +/** > + * DOC: sched_policy (int) > + * Used to override default entites scheduling policy in a run queue. > + */ > +MODULE_PARM_DESC(sched_policy, > + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); > +module_param_named(sched_policy, drm_sched_policy, int, 0444); As per Christian's comments, you can drop the "auto" and perhaps leave one as the default, say the RR. I do think it is beneficial to have a module parameter control the scheduling policy, as shown above. > + > + > #define to_drm_sched_job(sched_job) \ > container_of((sched_job), struct drm_sched_job, queue_node) > > @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, > } > > /** > - * drm_sched_rq_select_entity - Select an entity which could provide a job to run > + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run > * > * @rq: scheduler run queue to check. > * > - * Try to find a ready entity, returns NULL if none found. > + * Try to find a ready entity, in round robin manner. > + * > + * Returns NULL if none found. > */ > static struct drm_sched_entity * > -drm_sched_rq_select_entity(struct drm_sched_rq *rq) > +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) > { > struct drm_sched_entity *entity; > > @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) > return NULL; > } > > +/** > + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run > + * > + * @rq: scheduler run queue to check. > + * > + * Try to find a ready entity, based on FIFO order of jobs arrivals. > + * > + * Returns NULL if none found. > + */ > +static struct drm_sched_entity * > +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) > +{ > + struct drm_sched_entity *tmp, *entity = NULL; > + ktime_t oldest_ts = KTIME_MAX; > + struct drm_sched_job *sched_job; > + > + spin_lock(&rq->lock); > + > + list_for_each_entry(tmp, &rq->entities, list) { > + > + if (drm_sched_entity_is_ready(tmp)) { > + sched_job = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue)); > + > + if (ktime_before(sched_job->submit_ts, oldest_ts)) { > + oldest_ts = sched_job->submit_ts; > + entity = tmp; > + } > + } > + } Here I think we need an O(1) lookup of the next job to pick out to run. I see a number of optimizations, for instance keeping the current/oldest timestamp in the rq struct itself, or better yet keeping the next job to pick out to run at a head of list (a la timer wheel implementation). For suck an optimization to work, you'd prep things up on job insertion, rather than on job removal/pick to run. I'm also surprised that there is no job transition from one queue to another, as it is picked to run next--for the above optimizations to implemented, you'd want a state transition from (state) queue to queue. Regards, Luben > + > + if (entity) { > + rq->current_entity = entity; > + reinit_completion(&entity->entity_idle); > + } > + > + spin_unlock(&rq->lock); > + return entity; > +} > + > /** > * drm_sched_job_done - complete a job > * @s_job: pointer to the job which is done > @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) > > /* Kernel run queue has higher priority than normal run queue*/ > for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { > - entity = drm_sched_rq_select_entity(&sched->sched_rq[i]); > + entity = drm_sched_policy != 1 ? > + drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) : > + drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]); > + > if (entity) > break; > } > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h > index addb135eeea6..95865881bfcf 100644 > --- a/include/drm/gpu_scheduler.h > +++ b/include/drm/gpu_scheduler.h > @@ -314,6 +314,14 @@ struct drm_sched_job { > > /** @last_dependency: tracks @dependencies as they signal */ > unsigned long last_dependency; > + > + /** > + * @submit_ts: > + * > + * Marks job submit time > + */ > + ktime_t submit_ts; > + > }; > > static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job,