Re: [PATCH] drm/sched: Eliminate drm_sched_run_job_queue_if_ready()

Luben Tuikov <ltuikov89@xxxxxxxxx> · Fri, 3 Nov 2023 20:25:23 -0400

Hi Tvrtko,

On 2023-11-03 06:39, Tvrtko Ursulin wrote:
> 
> On 02/11/2023 22:46, Luben Tuikov wrote:
>> Eliminate drm_sched_run_job_queue_if_ready() and instead just call
>> drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
>> the former function uses drm_sched_select_entity() to determine if the
>> scheduler had an entity ready in one of its run-queues, and in the case of the
>> Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
>> just that, selects the _next_ entity which is ready, sets up the run-queue and
>> completion and returns that entity. The FIFO scheduling algorithm is unaffected.
>>
>> Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
>> in the case of RR scheduling, that would result in calling select_entity()
>> twice, which may result in skipping a ready entity if more than one entity is
>> ready. This commit fixes this by eliminating the if_ready() variant.
> 
> Fixes: is missing since the regression already landed.

Ah, yes, thank you for pointing that out. :-)
I'll add one.

> 
>>
>> Signed-off-by: Luben Tuikov <ltuikov89@xxxxxxxxx>
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
>>   1 file changed, 2 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index 98b2ad54fc7071..05816e7cae8c8b 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
>>   }
>>   EXPORT_SYMBOL(drm_sched_pick_best);
>>   
>> -/**
>> - * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
>> - * @sched: scheduler instance
>> - */
>> -static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
>> -{
>> -	if (drm_sched_select_entity(sched))
>> -		drm_sched_run_job_queue(sched);
>> -}
>> -
>>   /**
>>    * drm_sched_free_job_work - worker to call free_job
>>    *
>> @@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
>>   		sched->ops->free_job(cleanup_job);
>>   
>>   		drm_sched_free_job_queue_if_done(sched);
>> -		drm_sched_run_job_queue_if_ready(sched);
>> +		drm_sched_run_job_queue(sched);
> 
> It works but is a bit wasteful causing needless CPU wake ups with a 

I'd not worry about "waking up the CPU" as the CPU scheduler would most likely
put the wq on the same CPU by instruction cache locality.

> potentially empty queue, both here and in drm_sched_run_job_work below.

That's true, but if you were to look at the typical execution of
this code you'd see we get a string of function entry when the incoming queue
is non-empty, followed by one empty entry only to be taken off the CPU. So,
it really isn't a breaker.

So, there's a way to mitigate this in drm_sched_run_job_work(). I'll see that it
makes it in the next version of the patch.

Thanks!
-- 
Regards,
Luben
Attachment:
OpenPGP_0x4C15479431A334AF.asc

Description: OpenPGP public key
Attachment:
OpenPGP_signature.asc

Description: OpenPGP digital signature