On 2020-02-28 2:47 a.m., Christian König wrote: > Am 28.02.20 um 06:08 schrieb Luben Tuikov: >> On 2020-02-27 4:40 p.m., Nirmoy Das wrote: >>> implement drm_sched_entity_modify_sched() which can modify existing >>> sched_list with a different one. This is going to be helpful when >>> userspace changes priority of a ctx/entity then driver can switch to >>> corresponding hw shced list for that priority >>> >>> Signed-off-by: Nirmoy Das <nirmoy.das@xxxxxxx> >>> --- >>> drivers/gpu/drm/scheduler/sched_entity.c | 24 ++++++++++++++++++++++++ >>> include/drm/gpu_scheduler.h | 4 ++++ >>> 2 files changed, 28 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c >>> index 63bccd201b97..711e9d504bcb 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_entity.c >>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >>> @@ -83,6 +83,30 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, >>> } >>> EXPORT_SYMBOL(drm_sched_entity_init); >>> >>> +/** >>> + * drm_sched_entity_modify_sched - Modify sched of an entity >>> + * >>> + * @entity: scheduler entity to init >>> + * @sched_list: the list of new drm scheds which will replace >>> + * existing entity->sched_list >>> + * @num_sched_list: number of drm sched in sched_list >>> + * >>> + * Returns 0 on success or a negative error code on failure. >>> + */ >>> +int drm_sched_entity_modify_sched(struct drm_sched_entity *entity, >>> + struct drm_gpu_scheduler **sched_list, >>> + unsigned int num_sched_list) >>> +{ >>> + if (!(entity && sched_list && (num_sched_list == 0 || sched_list[0]))) >>> + return -EINVAL; >> This seems unmaintainable. I'd write it in its natural form: > > This is probably just copy & pasted from the init function and complete > overkill here. It should be an easy rejection then. Statements like this make the code unmaintainable. Regardless of whether it was copy-and-pasted I wanted to emphasize the lack of simplification of what was being done. We need to put intention and sense into what we're doing, scrutinizing every line we put into a patch. This is why I suggested the simplification here: > >> >> int drm_sched_entity_modify_sched(struct drm_sched_entity *entity, >> struct drm_gpu_scheduler **sched_list, >> unsigned int num_sched_list) >> { >> if (entity && sched_list && (num_sched_list == 0 || sched_list[0] != NULL)) { >> entity->sched_list = num_sched_list > 1 ? sched_list : NULL; >> entity->num_sched_list = num_sched_list; >> return 0; >> } else { >> return -EINVAL; >> } >> } > > Actually that's a rather bad idea. Error handling should always be in I actually don't think that it is a "rather bad idea". At all. I actually think that it makes this leaf function more clear to understand as the conditional would read like a sentence in prose. > the form of: > > if (check_error || missing_prerequisite) > return_or_goto_cleanup; I don't think we should generalize across the board. We should be more flexible in order to create clear and maintainable code. > >> That's too heavy. Can we improve the architecture >> so we don't have to check for this in leaf functions like this one? >> We can just return a parameterization. >> >> Why would this be called with entity being NULL? >> Or with sched_list being NULL? Or num_sched_list being not zero >> yet sched_list[0] being NULL? Why not make sure that sched_list[0] is >> never NULL and that num_sched_list is greater than 0 always? >> >> Does this make it to user space? >> Why would the state of execution be one such that this is true/false >> for the code to return -EINVAL? >> From patch 3/4 it seems that an error is printed inside amdgpu_ctx_priority_override() >> and the result is not propagated up the stack. >> >> I think we should improve the code where here this condition above >> is never true. Then we can use parameterization for those two >> statements below: >> >>> + >>> + entity->sched_list = num_sched_list > 1 ? sched_list : NULL; >> So if we're here, we know from the top conditional that >> either num_sched_list is 0 or that sched_list[0] not NULL >> or both. >> >> So if num_sched_list is 0 or 1 we return NULL? >> And if num_sched_list is 2 or greater we return sched_list >> of which sched_list[0] could be NULL? > > This is also copy&pasted from the init code and completely wrong here. So even more reasons to reject this patch. > > What we should do instead is just: WARN_ON(!num_sched_list || !sched_list); Again, what does that *mean*? What does the check mean and what does the num_sched_list == 0 or sched_list == NULL mean? And how did we get into a situation like this where either or both could be nil? Wouldn't it be better to simplify or re-architecture this (we only recently decided to hide physical rings from user-space), so that the code is elegant (meaning no if-else) yet flexible and straightforward? > > And to the checking for not keeping around the scheduler list in the > init function. > >> Why not fix the architecture so that this is simply copied? > > We had that and moved away from it because the scheduler list is > actually const and shouldn't be allocated with each entity (which we can > easily have thousands of). I think that peppering the code with if-else conditionals everywhere as these patch-series into the DRM scheduler have been, would make the code unmaintainable in the long run. Regards, Luben > > Regards, > Christian. > >> (in which case >> we can simply index-parameterize it and simply copy it. >> Why are there so many checks everywhere? >> >>> + entity->num_sched_list = num_sched_list; >>> + >> I mean, all we're doing in this function is initializing >> entity->sched_list and entity->num_sched_list. Why does this >> function have to be so complex and do so many checks? >> Can we improve/fix the architecture instead? >> >> Regards, >> Luben >> >> >>> + return 0; >>> +} >>> +EXPORT_SYMBOL(drm_sched_entity_modify_sched); >>> + >>> /** >>> * drm_sched_entity_is_idle - Check if entity is idle >>> * >>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h >>> index 589be851f8a1..0c164a96d51b 100644 >>> --- a/include/drm/gpu_scheduler.h >>> +++ b/include/drm/gpu_scheduler.h >>> @@ -297,6 +297,10 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched); >>> int drm_sched_job_init(struct drm_sched_job *job, >>> struct drm_sched_entity *entity, >>> void *owner); >>> +int drm_sched_entity_modify_sched(struct drm_sched_entity *entity, >>> + struct drm_gpu_scheduler **sched_list, >>> + unsigned int num_sched_list); >>> + >>> void drm_sched_job_cleanup(struct drm_sched_job *job); >>> void drm_sched_wakeup(struct drm_gpu_scheduler *sched); >>> void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad); >>> > _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx