Re: [PATCH v2] drm/sced: Add FIFO sched policy to rq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Luben, just a ping, whenever you have time.

Andrey

On 2022-09-05 01:57, Christian König wrote:


Am 03.09.22 um 04:48 schrieb Andrey Grodzovsky:
Poblem: Given many entities competing for same rq on
same scheduler an uncceptabliy long wait time for some
jobs waiting stuck in rq before being picked up are
observed (seen using  GPUVis).
The issue is due to Round Robin policy used by scheduler
to pick up the next entity for execution. Under stress
of many entities and long job queus within entity some
jobs could be stack for very long time in it's entity's
queue before being popped from the queue and executed
while for other entites with samller job queues a job
might execute ealier even though that job arrived later
then the job in the long queue.

Fix:
Add FIFO selection policy to entites in RQ, chose next enitity
on rq in such order that if job on one entity arrived
ealrier then job on another entity the first job will start
executing ealier regardless of the length of the entity's job
queue.

v2:
Switch to rb tree structure for entites based on TS of
oldest job waiting in job queue of enitity. Improves next
enitity extraction to O(1). Enitity TS update
O(log(number of entites in rq))

Drop default option in module control parameter.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>
Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li@xxxxxxx>
[SNIP]
  /**
@@ -313,6 +330,14 @@ struct drm_sched_job {
        /** @last_dependency: tracks @dependencies as they signal */
      unsigned long            last_dependency;
+
+
+    /**
+    * @submit_ts:
+    *
+    * Marks job submit time

Maybe write something like "When the job was pushed into the entity queue."

Apart from that I leave it to Luben and you to get this stuff upstream.

Thanks,
Christian.

+    */
+    ktime_t                         submit_ts;
  };
    static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, @@ -501,6 +526,10 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
  void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
                  struct drm_sched_entity *entity);
  +void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts,
+                  bool remove_only);
+
+
  int drm_sched_entity_init(struct drm_sched_entity *entity,
                enum drm_sched_priority priority,
                struct drm_gpu_scheduler **sched_list,




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux