Re: [PATCH 1/1] drm/amdgpu: disable gpu_sched load balancer for vcn jobs

Nirmoy <nirmodas@xxxxxxx> · Thu, 12 Mar 2020 11:56:42 +0100





On 3/12/20 9:50 AM, Christian König wrote:
Am 11.03.20 um 21:55 schrieb Nirmoy:

On 3/11/20 9:35 PM, Andrey Grodzovsky wrote:

On 3/11/20 4:32 PM, Nirmoy wrote:

On 3/11/20 9:02 PM, Andrey Grodzovsky wrote:

On 3/11/20 4:00 PM, Andrey Grodzovsky wrote:

On 3/11/20 4:00 PM, Nirmoy Das wrote:
[SNIP]
@@ -1257,6 +1258,9 @@ static int amdgpu_cs_submit(struct 
amdgpu_cs_parser *p,
      priority = job->base.s_priority;
      drm_sched_entity_push_job(&job->base, entity);
  +    if (ring->funcs->no_gpu_sched_loadbalance)
+ amdgpu_ctx_disable_gpu_sched_load_balance(entity);
+


Why this needs to be done each time a job is submitted and not 
once in drm_sched_entity_init (same foramdgpu_job_submit bellow ?)

Andrey


My bad - not in drm_sched_entity_init but in relevant amdgpu code.


Hi Andrey,

Do you mean drm_sched_job_init() or after creating VCN entities?


Nirmoy


I guess after creating the VCN entities (has to be amdgpu specific 
code) - I just don't get why it needs to be done each time job is 
submitted, I mean - since you set .no_gpu_sched_loadbalance = true 
anyway this is always true and so shouldn't you just initialize the 
VCN entity with a schedulers list consisting of one scheduler and 
that it ?


Assumption: If I understand correctly we shouldn't be doing load 
balance among VCN jobs in the same context. Christian, James and Leo 
can clarify that if I am wrong.

But we can still do load balance of VNC jobs among multiple contexts. 
That load balance decision happens in drm_sched_entity_init(). If we 
initialize VCN entity with one scheduler then

all entities irrespective of context gets that one scheduler which 
means we are not utilizing extra VNC instances.

Andrey has a very good point here. So far we only looked at this from 
the hardware requirement side that we can't change the ring after the 
first submission any more.

But it is certainly valuable to keep the extra overhead out of the hot 
path during command submission.




Ideally we should be calling 
amdgpu_ctx_disable_gpu_sched_load_balance() only once after 1st call 
of drm_sched_entity_init() of a VCN job. I am not sure how to do that 
efficiently.

Another option might be to copy the logic of 
drm_sched_entity_get_free_sched() and choose suitable VNC sched 
at/after VCN entity creation.

Yes, but we should not copy the logic but rather refactor it :)

Basically we need a drm_sched_pick_best() function which gets an array 
of drm_gpu_scheduler structures and returns the one with the least 
load on it.

This function can then be used by VCN to pick one instance before 
initializing the entity as well as a replacement for 
drm_sched_entity_get_free_sched() to change the scheduler for load 
balancing.


This sounds like a optimum solution here.

Thanks Andrey and Christian. I will resend with suggested changes.



Regards,
Christian.



Regards,

Nirmoy


_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx