Am 10.12.19 um 16:08 schrieb Nirmoy:
I think amdgpu_ctx_init() should check for num_scheds and not call
drm_sched_entity_init()
if its zero.
Ah, that's where that came from. No that is intentionally this way, but
see below.
On 12/10/19 3:47 PM, Nirmoy wrote:
On 12/10/19 2:00 PM, Christian König wrote:
Am 10.12.19 um 13:53 schrieb Nirmoy Das:
entity should not keep copy and maintain sched list for
itself.
Signed-off-by: Nirmoy Das <nirmoy.das@xxxxxxx>
---
drivers/gpu/drm/scheduler/sched_entity.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
b/drivers/gpu/drm/scheduler/sched_entity.c
index f9b6ce29c58f..a5f729f421f8 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -67,17 +67,10 @@ int drm_sched_entity_init(struct
drm_sched_entity *entity,
entity->guilty = guilty;
entity->num_sched_list = num_sched_list;
entity->priority = priority;
- entity->sched_list = kcalloc(num_sched_list,
- sizeof(struct drm_gpu_scheduler *),
GFP_KERNEL);
+ entity->sched_list = sched_list;
Maybe make this "num_sched_list > 1 ? sched_list : NULL" to avoid
accidentally dereferencing a stale pointer to the stack.
Do you mean "num_sched_list >= 1 ? sched_list : NULL"
No, the entity->sched_list field should be NULL when num_sched_list==1.
When num_sched_list==1 the entity->sched_list shouldn't be used and we
can use a dummy list on the stack as parameter. But we should set the
pointer to NULL in this case just to make sure that nobody is
dereferencing it.
- if(!entity->sched_list)
- return -ENOMEM;
init_completion(&entity->entity_idle);
-
- for (i = 0; i < num_sched_list; i++)
- entity->sched_list[i] = sched_list[i];
-
if (num_sched_list)
That check can actually be dropped as well. We return -EINVAL when
the num_sched_list is zero.
This one took me some time to understand. So we don't really return
-EINVAL if num_sched_list == 0
if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0])))
return -EINVAL;
This is coming from below patch. Are we suppose to tolerate IPs with
uninitialized sched so that ctx creation dosn't return error ?
Yes, exactly. That's intentionally this way.
GPU reset sometimes resulted in schedulers being disabled because we
couldn't re-init them. In this case we had entities with an empty
scheduler list.
That can't happen any more after you make the scheduler arrays constant,
but I would stick with that behavior.
Regards,
Christian.
commit 1decbf6bb0b4dc56c9da6c5e57b994ebfc2be3aa
Author: Bas Nieuwenhuizen <bas@xxxxxxxxxxxxxxxxxxx>
Date: Wed Jan 30 02:53:19 2019 +0100
drm/sched: Fix entities with 0 rqs.
Some blocks in amdgpu can have 0 rqs.
Job creation already fails with -ENOENT when entity->rq is NULL,
so jobs cannot be pushed. Without a rq there is no scheduler to
pop jobs, and rq selection already does the right thing with a
list of length 0.
So the operations we need to fix are:
- Creation, do not set rq to rq_list[0] if the list can have
length 0.
- Do not flush any jobs when there is no rq.
- On entity destruction handle the rq = NULL case.
- on set_priority, do not try to change the rq if it is NULL.
Regards,
Christian.
entity->rq =
&entity->sched_list[0]->sched_rq[entity->priority];
@@ -312,7 +305,6 @@ void drm_sched_entity_fini(struct
drm_sched_entity *entity)
dma_fence_put(entity->last_scheduled);
entity->last_scheduled = NULL;
- kfree(entity->sched_list);
}
EXPORT_SYMBOL(drm_sched_entity_fini);
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx