Re: [PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 04/11/2023 00:25, Luben Tuikov wrote:
Hi Tvrtko,

On 2023-11-03 06:39, Tvrtko Ursulin wrote:

On 02/11/2023 22:46, Luben Tuikov wrote:
Eliminate drm_sched_run_job_queue_if_ready() and instead just call
drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
the former function uses drm_sched_select_entity() to determine if the
scheduler had an entity ready in one of its run-queues, and in the case of the
Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
just that, selects the _next_ entity which is ready, sets up the run-queue and
completion and returns that entity. The FIFO scheduling algorithm is unaffected.

Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
in the case of RR scheduling, that would result in calling select_entity()
twice, which may result in skipping a ready entity if more than one entity is
ready. This commit fixes this by eliminating the if_ready() variant.

Fixes: is missing since the regression already landed.

Ah, yes, thank you for pointing that out. :-)
I'll add one.



Signed-off-by: Luben Tuikov <ltuikov89@xxxxxxxxx>
---
   drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
   1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 98b2ad54fc7071..05816e7cae8c8b 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
   }
   EXPORT_SYMBOL(drm_sched_pick_best);
-/**
- * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
- * @sched: scheduler instance
- */
-static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
-{
-	if (drm_sched_select_entity(sched))
-		drm_sched_run_job_queue(sched);
-}
-
   /**
    * drm_sched_free_job_work - worker to call free_job
    *
@@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
   		sched->ops->free_job(cleanup_job);
drm_sched_free_job_queue_if_done(sched);
-		drm_sched_run_job_queue_if_ready(sched);
+		drm_sched_run_job_queue(sched);

It works but is a bit wasteful causing needless CPU wake ups with a

I'd not worry about "waking up the CPU" as the CPU scheduler would most likely
put the wq on the same CPU by instruction cache locality.

potentially empty queue, both here and in drm_sched_run_job_work below.

That's true, but if you were to look at the typical execution of
this code you'd see we get a string of function entry when the incoming queue
is non-empty, followed by one empty entry only to be taken off the CPU. So,
it really isn't a breaker.

So, there's a way to mitigate this in drm_sched_run_job_work(). I'll see that it
makes it in the next version of the patch.

Okay, I will be keeping an eye on that.

Separately, I might send a patch to do the "re-queue if more pending" in one atomic section. (Instead of re-acquiring the lock.)

And also as a heads up, at some point in the next few months I will start looking at the latency and power effects of the "do just one and re-queue" conversion. In ChromeOS milli-Watts matter and some things like media playback do have a lot of inter-engine dependencies. So keeping the CPU C state residency low might matter. Well, it might matter for server media transcode stream density workloads too, both from power and stream capacity per socket metrics.

Regards,

Tvrtko



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux