Re: [PATCH v2 4/9] drm/sched: Split free_job into own work item

Danilo Krummrich <dakr@xxxxxxxxxx> · Tue, 29 Aug 2023 03:20:37 +0200

On 8/28/23 20:41, Matthew Brost wrote:
On Mon, Aug 28, 2023 at 08:04:31PM +0200, Danilo Krummrich wrote:
On 8/11/23 04:31, Matthew Brost wrote:
Rather than call free_job and run_job in same work item have a dedicated
work item for each. This aligns with the design and intended use of work
queues.

Signed-off-by: Matthew Brost <matthew.brost@xxxxxxxxx>
---
   drivers/gpu/drm/scheduler/sched_main.c | 137 ++++++++++++++++++-------
   include/drm/gpu_scheduler.h            |   8 +-
   2 files changed, 106 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index cede47afc800..b67469eac179 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1275,7 +1338,8 @@ EXPORT_SYMBOL(drm_sched_submit_ready);
   void drm_sched_submit_stop(struct drm_gpu_scheduler *sched)

I was wondering what the scheduler teardown sequence looks like for
DRM_SCHED_POLICY_SINGLE_ENTITY and how XE does that.

In Nouveau, userspace can ask the kernel to create a channel (or multiple),
where each channel represents a ring feeding the firmware scheduler. Userspace
can forcefully close channels via either a dedicated IOCTL or by just closing
the FD which subsequently closes all channels opened through this FD.

When this happens the scheduler needs to be teared down. Without keeping track of
things in a driver specific way, the only thing I could really come up with is the
following.

/* Make sure no more jobs are fetched from the entity. */
drm_sched_submit_stop();

/* Wait for the channel to be idle, namely jobs in flight to complete. */
nouveau_channel_idle();

/* Stop the scheduler to free jobs from the pending_list. Ring must be idle at this
  * point, otherwise me might leak jobs. Feels more like a workaround to free
  * finished jobs.
  */
drm_sched_stop();

/* Free jobs from the entity queue. */
drm_sched_entity_fini();

/* Probably not even needed in this case. */
drm_sched_fini();

This doesn't look very straightforward though. I wonder if other drivers feeding
firmware schedulers have similar cases. Maybe something like drm_sched_teardown(),
which would stop job submission, wait for pending jobs to finish and subsequently
free them up would makes sense?


exec queue == gpu scheduler + entity in Xe

We kinda invented our own flow with reference counting + use the TDR for
cleanup.

Thanks for the detailed explanation. In case of making it driver specific
I thought about something similar, pretty much the same reference counting,
but instead of the TDR, let jobs from the entity just return -ECANCELED from
job_run() and also signal pending jobs with the same error code.

On the other hand, I don't really want scheduler and job structures to
potentially outlive the channel. Which is where I think I'd be nice to avoid
consuming all the queued up jobs from the entity in the first place, stop the
schdeduler with drm_sched_submit_stop(), signal all pending_jobs with
-ECANCELED and call the free_job() callbacks right away.

The latter I could probably do in Nouveau as well, however, it kinda feels
wrong to do all that within the driver.

Also, I was wondering how existing drivers using the GPU scheduler handle
that. It seems like they just rely on the pending_list of the scheduler being
empty once drm_sched_fini() is called. Admittedly, that's pretty likely (never
to happen) since it's typically called on driver remove, but I don't see how
that's actually ensured. Am I missing something?

We have a creation ref for the exec queue plus each job takes a ref to
the exec queue. On exec queue close [1][2] (whether that be IOCTL or FD
close) we drop the creation reference and call a vfunc for killing thr
exec queue. The firmware implementation is here [3].

If you read through it just sets the TDR to the minimum value [4], the
TDR will kick any running jobs the off the hardware, signals the jobs
fences, any jobs waiting on dependencies eventually flush out via
run_job + TDR for cleanup without going on the hardware, the exec queue
reference count goes to zero once all jobs are flushed out, we trigger
the exec queue clean up flow and finally free all memory for the exec
queue.

Using the TDR in this way is how we teardown an exec queue for other
reasons too (user page fault, user job times out, user job hang detected
by firmware, device reset, etc...).

This all works rather nicely and is a single code path for all of these
cases. I'm no sure if this can be made any more generic nor do I really
see the need too (at least I don't see Xe needing a generic solution).

Hope this helps,
Matt

[1] https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/drm-xe-next/drivers/gpu/drm/xe/xe_exec_queue.c#L911
[2] https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/drm-xe-next/drivers/gpu/drm/xe/xe_device.c#L77
[3] https://gitlab.freedesktop.org/drm/xe/kernel/-/tree/drm-xe-next/drivers/gpu/drm/xe#L1184
[4] https://gitlab.freedesktop.org/drm/xe/kernel/-/tree/drm-xe-next/drivers/gpu/drm/xe#L789

- Danilo