On 28/08/2024 10:41, Philipp Stanner wrote:
drm_sched_job_init()'s name suggests that after the function succeeded, parameter "job" will be fully initialized. This is not the case; some members are only later set, notably "job->sched" by drm_sched_job_arm(). Document that drm_sched_job_init() does not set all struct members. Document that job->sched in particular is uninitialized before drm_sched_job_arm(). Signed-off-by: Philipp Stanner <pstanner@xxxxxxxxxx> --- Changes in v2: - Change grammar in the new comments a bit. --- drivers/gpu/drm/scheduler/sched_main.c | 4 ++++ include/drm/gpu_scheduler.h | 7 +++++++ 2 files changed, 11 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index b0c8ad10b419..721373938c1e 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -781,6 +781,10 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs); * Drivers must make sure drm_sched_job_cleanup() if this function returns * successfully, even when @job is aborted before drm_sched_job_arm() is called. * + * Note that this function does not assign a valid value to each struct member + * of struct drm_sched_job. Take a look at that struct's documentation to see + * who sets which struct member with what lifetime.
First sentence is fine, but the second I don't see the those details in struct drm_sched_job. (And I am not saying that they must be listed. IMO at some point it is better to have a high level overview than describe the lifetime rules with individual members.)
+ * * WARNING: amdgpu abuses &drm_sched.ready to signal when the hardware * has died, which can mean that there's no valid runqueue for a @entity. * This function returns -ENOENT in this case (which probably should be -EIO as diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 5acc64954a88..04a268cd22f1 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -337,6 +337,13 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f); struct drm_sched_job { struct spsc_node queue_node; struct list_head list; + + /* + * The scheduler this job is or will be scheduled on. + * + * Gets set by drm_sched_arm(). Valid until the scheduler's backend_ops + * callback "free_job()" has been called.
This is interesting - I was not sure where lifetime for job->sched is defined and couldn't find it browsing around. Where did you find the clues to tie it to the free_job() callback?
Regards, Tvrtko
+ */ struct drm_gpu_scheduler *sched; struct drm_sched_fence *s_fence;