On Fri, 2024-09-13 at 12:56 +0100, Tvrtko Ursulin wrote: > > Hi, > > On 28/08/2024 10:41, Philipp Stanner wrote: > > drm_sched_job_init() has no control over how users allocate struct > > drm_sched_job. Unfortunately, the function can also not set some > > struct > > members such as job->sched. > > job->sched usage from within looks like a bug. But not related to the > memset you add. > > For this one something like this looks easiest for a start: > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > index ab53ab486fe6..877113b01af2 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -788,7 +788,7 @@ int drm_sched_job_init(struct drm_sched_job *job, > * or worse--a blank screen--leave a trail in the > * logs, so this can be debugged easier. > */ > - drm_err(job->sched, "%s: entity has no rq!\n", > __func__); > + pr_err("%s: entity has no rq!\n", __func__); > return -ENOENT; > } > > Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to > variable > number of run-queues") > Cc: <stable@xxxxxxxxxxxxxxx> # v6.7+ Danilo and I already solved that: https://lore.kernel.org/all/20240827074521.12828-2-pstanner@xxxxxxxxxx/ > > > This could theoretically lead to UB by users dereferencing the > > struct's > > pointer members too early. > > Hmm if drm_sched_job_init returned an error callers should not > dereference anything. What was actually the issue you were debugging? I was learning about the scheduler, wrote a dummy driver and had awkward behavior. Turned out it was this pointer not being initialized. I would have seen it immediately if it were NULL. The actual issue was and is IMO that a function called drm_sched_job_init() initializes the job. But it doesn't, it only partially initializes it. Only after drm_sched_job_arm() ran you're actually ready to go. > > Adding a memset is I think not the best solution since it is very > likely > redundant to someone doing a kzalloc in the first place. It is redundant in most cases, but it is effectively for free. I measured the runtime with 1e6 jobs with and without memset and there was no difference. P. > > Regards, > > Tvrtko > > > It is easier to debug such issues if these pointers are initialized > > to > > NULL, so dereferencing them causes a NULL pointer exception. > > Accordingly, drm_sched_entity_init() does precisely that and > > initializes > > its struct with memset(). > > > > Initialize parameter "job" to 0 in drm_sched_job_init(). > > > > Signed-off-by: Philipp Stanner <pstanner@xxxxxxxxxx> > > --- > > No changes in v2. > > --- > > drivers/gpu/drm/scheduler/sched_main.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > b/drivers/gpu/drm/scheduler/sched_main.c > > index 356c30fa24a8..b0c8ad10b419 100644 > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > @@ -806,6 +806,14 @@ int drm_sched_job_init(struct drm_sched_job > > *job, > > return -EINVAL; > > } > > > > + /* > > + * We don't know for sure how the user has allocated. > > Thus, zero the > > + * struct so that unallowed (i.e., too early) usage of > > pointers that > > + * this function does not set is guaranteed to lead to a > > NULL pointer > > + * exception instead of UB. > > + */ > > + memset(job, 0, sizeof(*job)); > > + > > job->entity = entity; > > job->credits = credits; > > job->s_fence = drm_sched_fence_alloc(entity, owner); >