Re: [RFC PATCH 00/10] Xe DRM scheduler and long running workload plans

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 04.04.23 um 15:37 schrieb Matthew Brost:
On Tue, Apr 04, 2023 at 11:13:28AM +0200, Christian König wrote:
Hi,

Am 04.04.23 um 02:22 schrieb Matthew Brost:
Hello,

As a prerequisite to merging the new Intel Xe DRM driver [1] [2], we
have been asked to merge our common DRM scheduler patches first as well
as develop a common solution for long running workloads with the DRM
scheduler. This RFC series is our first attempt at doing this. We
welcome any and all feedback.

This can we thought of as 4 parts detailed below.

- DRM scheduler changes for 1 to 1 relationship between scheduler and
entity (patches 1-3)

In Xe all of the scheduling of jobs is done by a firmware scheduler (the
GuC) which is a new paradigm WRT to the DRM scheduler and presents
severals problems as the DRM was originally designed to schedule jobs on
hardware queues. The main problem being that DRM scheduler expects the
submission order of jobs to be the completion order of jobs even across
multiple entities. This assumption falls apart with a firmware scheduler
as a firmware scheduler has no concept of jobs and jobs can complete out
of order. A novel solution for was originally thought of by Faith during
the initial prototype of Xe, create a 1 to 1 relationship between scheduler
and entity. I believe the AGX driver [3] is using this approach and
Boris may use approach as well for the Mali driver [4].

To support a 1 to 1 relationship we move the main execution function
from a kthread to a work queue and add a new scheduling mode which
bypasses code in the DRM which isn't needed in a 1 to 1 relationship.
The new scheduling mode should unify all drivers usage with a 1 to 1
relationship and can be thought of as using scheduler as a dependency /
infligt job tracker rather than a true scheduler.

- Generic messaging interface for DRM scheduler

Idea is to be able to communicate to the submission backend with in band
(relative to main execution function) messages. Messages are backend
defined and flexable enough for any use case. In Xe we use these
messages to clean up entites, set properties for entites, and suspend /
resume execution of an entity [5]. I suspect other driver can leverage
this messaging concept too as it a convenient way to avoid races in the
backend.
Oh, please absolutely *don't* do this.

This is basically the design which makes a bunch of stuff so horrible broken
on Windows.

I can explain it in more detail if necessary, but I strongly recommend to
not go down this path.

I'm afraid we are going to have to discuss this further. Let me explain
my reasoning, basically the idea is to have a single main entry point to
backend - the work queue. This avoids the need for lock between run_job
and any message that changes an entites state, also it really helps
during the reset flows (either TDR or GT reset) as we can call
drm_sched_run_wq_stop can ensure that nothing else is in the backend
changing an entity state. It all works out really nicely actually, our
GuC backend is incredibly stable (hasn't really had a bug pop in about a
year) and way simpler than what we did in the i915. I think the simplity
to largely due to this design of limiting the entry points.

I personally don't see how this a poor design, limiting entry points
absolutely makes sense to me, if it didn't why not just call cleanup_job
bypassing the main execution thread (now worker), this is the exact same
concept.

Well then I strongly suggest to read a few analyses on the failure of the message processing loop on Windows.

Have you ever wondered why classic Win32 applications sometimes seems to be stuck and don't do anything? This design pattern combine with timeouts to solve deadlocks is the reason for that.

The major problem with this approach is that analyzing tools like lockdep have a hard time grasping the dependencies.

What you can do is to offload all your operations which are supposed to be run in the same thread as work items into a work queue. This is something lockdep understands and is able to scream out lout if someone messes up the deadlock dependencies.

Regards,
Christian.


FWIW Asahi liked the idea as well and think it could be useful for AGX.
Matt

Regards,
Christian.

- Support for using TDR for all error paths of a scheduler / entity

Fix a few races / bugs, add function to dynamically set the TDR timeout.

- Annotate dma-fences for long running workloads.

The idea here is to use dma-fences only as sync points within the
scheduler and never export them for long running workloads. By
annotating these fences as long running we ensure that these dma-fences
are never used in a way that breaks the dma-fence rules. A benefit of
thus approach is the scheduler can still safely flow control the
execution ring buffer via the job limit without breaking the dma-fence
rules.

Again this a first draft and looking forward to feedback.

Enjoy - Matt

[1] https://gitlab.freedesktop.org/drm/xe/kernel
[2] https://patchwork.freedesktop.org/series/112188/
[3] https://patchwork.freedesktop.org/series/114772/
[4] https://patchwork.freedesktop.org/patch/515854/?series=112188&rev=1
[5] https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/drm-xe-next/drivers/gpu/drm/xe/xe_guc_submit.c#L1031

Matthew Brost (8):
    drm/sched: Convert drm scheduler to use a work queue rather than
      kthread
    drm/sched: Move schedule policy to scheduler / entity
    drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling policy
    drm/sched: Add generic scheduler message interface
    drm/sched: Start run wq before TDR in drm_sched_start
    drm/sched: Submit job before starting TDR
    drm/sched: Add helper to set TDR timeout
    drm/syncobj: Warn on long running dma-fences

Thomas Hellström (2):
    dma-buf/dma-fence: Introduce long-running completion fences
    drm/sched: Support long-running sched entities

   drivers/dma-buf/dma-fence.c                 | 142 +++++++---
   drivers/dma-buf/dma-resv.c                  |   5 +
   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  14 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  15 +-
   drivers/gpu/drm/drm_syncobj.c               |   5 +-
   drivers/gpu/drm/etnaviv/etnaviv_sched.c     |   5 +-
   drivers/gpu/drm/lima/lima_sched.c           |   5 +-
   drivers/gpu/drm/msm/adreno/adreno_device.c  |   6 +-
   drivers/gpu/drm/msm/msm_ringbuffer.c        |   5 +-
   drivers/gpu/drm/panfrost/panfrost_job.c     |   5 +-
   drivers/gpu/drm/scheduler/sched_entity.c    | 127 +++++++--
   drivers/gpu/drm/scheduler/sched_fence.c     |   6 +-
   drivers/gpu/drm/scheduler/sched_main.c      | 278 +++++++++++++++-----
   drivers/gpu/drm/v3d/v3d_sched.c             |  25 +-
   include/drm/gpu_scheduler.h                 | 130 +++++++--
   include/linux/dma-fence.h                   |  60 ++++-
   16 files changed, 649 insertions(+), 184 deletions(-)





[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux