Re: [RFC PATCH] drm/amdgpu: allocate entities on demand

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/29/19 7:42 PM, Christian König wrote:
Am 29.11.19 um 15:29 schrieb Nirmoy:
Hi Christian,

On 11/26/19 10:45 AM, Christian König wrote:
It looks like a start, but there numerous things which needs to be fixed.

Question number one is: What's that good for? Entities are not the problem here. The real issue is the fence ring and the rq_list.

The rq_list could actually be made constant since it should never be changed by the entity. It is only changed for backward compatibility in drm_sched_entity_set_priority().

So I would start there and cleanup the drm_sched_entity_set_priority() to actually just set a new constant rq list instead.

I am missing some context here. Can you please explain bit more? I looked over and over again but I still don't understand what do you mean by  new constant rq list :/

Ok that needs a bit wider explanation.

The GPU scheduler consists mainly of drm_gpu_scheduler instances. Each of those instances contain multiple runqueues with different priorities (5 IIRC).

Now for each entity we give a list of runqueues where this entity can be served on, e.g. where the jobs which are pushed to the entities are executed.

The entity itself keeps a copy of that runqueue list because we have the drm_sched_entity_set_priority() which modifies this runqueue list.

But essentially that is complete overkill, the runqueue lists are constant for each amdgpu device, e.g. all contexts should use SDMA0 and SDMA1 in the same way.

In other words building the list on runqueues should happen only once and not for each contexts.
Okay I understand now the real problem. Thanks for detail explanation.

Multiple approach to fix this would be possible. One rather elegant solution would be to change the rq list into a scheduler instances list + priority.

Do you mean something like

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 684692a8ed76..ac67f8f098fa 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -81,7 +81,7 @@ enum drm_sched_priority {
 struct drm_sched_entity {
        struct list_head                list;
        struct drm_sched_rq             *rq;
-       struct drm_sched_rq             **rq_list;
+      struct drm_gpu_scheduler        **sched;
        unsigned int                    num_rq_list;
        spinlock_t                      rq_lock;



This way we would also fix the age old bug that changing the priority of a context could actually mess up already scheduled jobs.

The alternative I noted before would be to drop drm_sched_entity_set_priority() or change it into drm_sched_entity_set_runqueues().
I was working on it but then I got stuck by a  "BUG: kernel NULL pointer dereference, address:" which I am trying to figure out why

Regards,
Christian.



Then we could embed the fences in amdgpu_ctx_entity as dynamic array at the end of the structure.

And last we can start to dynamic allocate and initialize the amdgpu_ctx_entity() structures.

Regards,
Christian.



_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux