On 2022-08-23 14:13, Andrey Grodzovsky wrote: > > On 2022-08-23 12:58, Luben Tuikov wrote: >> Inlined: >> >> On 2022-08-22 16:09, Andrey Grodzovsky wrote: >>> Poblem: Given many entities competing for same rq on >> ^Problem >> >>> same scheduler an uncceptabliy long wait time for some >> ^unacceptably >> >>> jobs waiting stuck in rq before being picked up are >>> observed (seen using GPUVis). >>> The issue is due to Round Robin policy used by scheduler >>> to pick up the next entity for execution. Under stress >>> of many entities and long job queus within entity some >> ^queues >> >>> jobs could be stack for very long time in it's entity's >>> queue before being popped from the queue and executed >>> while for other entites with samller job queues a job >> ^entities; smaller >> >>> might execute ealier even though that job arrived later >> ^earlier >> >>> then the job in the long queue. >>> >>> Fix: >>> Add FIFO selection policy to entites in RQ, chose next enitity >>> on rq in such order that if job on one entity arrived >>> ealrier then job on another entity the first job will start >>> executing ealier regardless of the length of the entity's job >>> queue. >>> >>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> >>> Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li@xxxxxxx> >>> --- >>> drivers/gpu/drm/scheduler/sched_entity.c | 2 + >>> drivers/gpu/drm/scheduler/sched_main.c | 65 ++++++++++++++++++++++-- >>> include/drm/gpu_scheduler.h | 8 +++ >>> 3 files changed, 71 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c >>> index 6b25b2f4f5a3..3bb7f69306ef 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_entity.c >>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >>> @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) >>> atomic_inc(entity->rq->sched->score); >>> WRITE_ONCE(entity->last_user, current->group_leader); >>> first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node); >>> + sched_job->submit_ts = ktime_get(); >>> + >>> >>> /* first job wakes up scheduler */ >>> if (first) { >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >>> index 68317d3a7a27..c123aa120d06 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>> @@ -59,6 +59,19 @@ >>> #define CREATE_TRACE_POINTS >>> #include "gpu_scheduler_trace.h" >>> >>> + >>> + >>> +int drm_sched_policy = -1; >>> + >>> +/** >>> + * DOC: sched_policy (int) >>> + * Used to override default entites scheduling policy in a run queue. >>> + */ >>> +MODULE_PARM_DESC(sched_policy, >>> + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); >>> +module_param_named(sched_policy, drm_sched_policy, int, 0444); >> As per Christian's comments, you can drop the "auto" and perhaps leave one as the default, >> say the RR. >> >> I do think it is beneficial to have a module parameter control the scheduling policy, as shown above. > > > Christian is not against it, just against adding 'auto' here - like the > default. Exactly what I said. Also, I still think an O(1) scheduling (picking next to run) should be what we strive for in such a FIFO patch implementation. A FIFO mechanism is by it's nature an O(1) mechanism for picking the next element. Regards, Luben > > >> >>> + >>> + >>> #define to_drm_sched_job(sched_job) \ >>> container_of((sched_job), struct drm_sched_job, queue_node) >>> >>> @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, >>> } >>> >>> /** >>> - * drm_sched_rq_select_entity - Select an entity which could provide a job to run >>> + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run >>> * >>> * @rq: scheduler run queue to check. >>> * >>> - * Try to find a ready entity, returns NULL if none found. >>> + * Try to find a ready entity, in round robin manner. >>> + * >>> + * Returns NULL if none found. >>> */ >>> static struct drm_sched_entity * >>> -drm_sched_rq_select_entity(struct drm_sched_rq *rq) >>> +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) >>> { >>> struct drm_sched_entity *entity; >>> >>> @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) >>> return NULL; >>> } >>> >>> +/** >>> + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run >>> + * >>> + * @rq: scheduler run queue to check. >>> + * >>> + * Try to find a ready entity, based on FIFO order of jobs arrivals. >>> + * >>> + * Returns NULL if none found. >>> + */ >>> +static struct drm_sched_entity * >>> +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) >>> +{ >>> + struct drm_sched_entity *tmp, *entity = NULL; >>> + ktime_t oldest_ts = KTIME_MAX; >>> + struct drm_sched_job *sched_job; >>> + >>> + spin_lock(&rq->lock); >>> + >>> + list_for_each_entry(tmp, &rq->entities, list) { >>> + >>> + if (drm_sched_entity_is_ready(tmp)) { >>> + sched_job = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue)); >>> + >>> + if (ktime_before(sched_job->submit_ts, oldest_ts)) { >>> + oldest_ts = sched_job->submit_ts; >>> + entity = tmp; >>> + } >>> + } >>> + } >> Here I think we need an O(1) lookup of the next job to pick out to run. >> I see a number of optimizations, for instance keeping the current/oldest >> timestamp in the rq struct itself, > > > This was my original design with rb tree based min heap per rq based on > time stamp of > the oldest job waiting in each entity's job queue (head of entity's SPCP > job queue). But how in this > case you record the timestamps of all the jobs waiting in entity's the > SPCP queue behind the head job ? > If you record only the oldest job and more jobs come you have no place > to store their timestamps and so > on next job select after current HEAD how you will know who came before > or after between 2 job queues of > of 2 entities ? > > >> or better yet keeping the next job >> to pick out to run at a head of list (a la timer wheel implementation). >> For suck an optimization to work, you'd prep things up on job insertion, rather >> than on job removal/pick to run. > > > I looked at timer wheel and I don't see how this applies here - it > assumes you know before > job submission your deadline time for job's execution to start - which > we don't so I don't see > how we can use it. This seems more suitable for real time scheduler > implementation where you > have a hard requirement to execute a job by some specific time T. > > I also mentioned a list, obviously there is the brute force solution of > just ordering all jobs in one giant list and get > naturally a FIFO ordering this way with O(1) insertions and extractions > but I assume we don't want one giant jobs queue > for all the entities to avoid more dependeies between them (like locking > the entire list when one specific entity is killed and > needs to remove it's jobs from SW queue). > >> >> I'm also surprised that there is no job transition from one queue to another, >> as it is picked to run next--for the above optimizations to implemented, you'd >> want a state transition from (state) queue to queue. > > > Not sure what you have in mind here ? How this helps ? > > Andrey > > >> >> Regards, >> Luben >> > > In my origianl design > >>> + >>> + if (entity) { >>> + rq->current_entity = entity; >>> + reinit_completion(&entity->entity_idle); >>> + } >>> + >>> + spin_unlock(&rq->lock); >>> + return entity; >>> +} >>> + >>> /** >>> * drm_sched_job_done - complete a job >>> * @s_job: pointer to the job which is done >>> @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) >>> >>> /* Kernel run queue has higher priority than normal run queue*/ >>> for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { >>> - entity = drm_sched_rq_select_entity(&sched->sched_rq[i]); >>> + entity = drm_sched_policy != 1 ? >>> + drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) : >>> + drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]); >>> + >>> if (entity) >>> break; >>> } >>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h >>> index addb135eeea6..95865881bfcf 100644 >>> --- a/include/drm/gpu_scheduler.h >>> +++ b/include/drm/gpu_scheduler.h >>> @@ -314,6 +314,14 @@ struct drm_sched_job { >>> >>> /** @last_dependency: tracks @dependencies as they signal */ >>> unsigned long last_dependency; >>> + >>> + /** >>> + * @submit_ts: >>> + * >>> + * Marks job submit time >>> + */ >>> + ktime_t submit_ts; >>> + >>> }; >>> >>> static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, Regards, -- Luben