On Mon, Apr 17, 2023 at 08:47:19AM +0200, Christian König wrote: > Am 11.04.23 um 16:13 schrieb Daniel Vetter: > > On Tue, Apr 11, 2023 at 11:02:55AM +0200, Christian König wrote: > > > The point is that this not only requires some work in the drm_scheduler, but > > > rather it then makes only little sense to use the drm_scheduler in the first > > > place. > > > > > > The whole point of the drm_scheduler is to provide dma_fence implementation > > > for the submitted jobs. > > > > > > We also have dependency handling, but as Daniel and I said this can be > > > easily extracted into a separate object/component. > > Uh that's not what I meant. My take is that minimally patching drm/sched > > to make the out-fence either optional, or complete it right away, is the > > simplest way to get at the dependency handling. For me at least the major > > part of drm/sched is the dep handling and timeout stuff. And the later can > > be reused with some glue to handle preempt timeouts too and other things, > > since tdr is a work struct you can just issue any other gpu timeouts on > > the same workqueue and using the roughly same pattern as the ->timed_out > > hook and it'll just work. > > Well that strongly sounds like what I had in mind as well. > > If we move the dependency/timeout functionality into a new component or if > we move the scheduler fence into a new component doesn't seem to matter, the > high level goal is that we have separated the two functionalities and both > approach will work for that. Ah ok, I guess I just got confused about your wording then. > > The entire "oh we also make sure your hw fence doesn't leak into public > > fences and causes lifetime mayhem" seems pretty minor. And maybe also > > something we want to replicate for the preempt-ctx dma_fence that some > > long-running context need (but more as part of drm_sched_entity I guess). > > > > We can of course bikeshed how much flexibility really should be in the > > different parts of drm/sched, but imo that's a bikeshed. > > Well the dependency handling in a separate component would still be > interesting to have since we need something similar for user queues as well. Yeah it might be neater to refactor, but I think that part is optional at least near-term. There's always room to polish shared code, and often it's better to do that once you have at least some in-tree users for the new need :-) -Daniel > > Christian. > > > -Daniel > > > > > > > Regards, > > > Christian. > > > > > > Am 07.04.23 um 02:20 schrieb Zeng, Oak: > > > > So this series basically go with option 2. The part that option2 makes me uncomfortable is, dma-fence doesn't work for long running workload, why we generate it in the first place? As long as dma-fence is generated, it will become a source of confusion in the future. It doesn't matter how much you annotate it/document it. So if we decide to go with option2, the bottom line is, don't generate dma-fence for long running workload during job submission. This requires some rework in drm scheduler. > > > > > > > > The cleanest solution to me is option3. Dma-fence is a very old technology. When it was created, no gpu support page fault. Obviously this is not a good technology for modern gpu with page fault support. I think the best way is to create a new scheduler and dependency tracking mechanism works for both page fault enabled and page fault disabled context. I think this matches what Christian said below. Maybe nobody think this is easy? > > > > > > > > Thanks, > > > > Oak > > > > > > > > > -----Original Message----- > > > > > From: Brost, Matthew <matthew.brost@xxxxxxxxx> > > > > > Sent: April 5, 2023 2:53 PM > > > > > To: Zeng, Oak <oak.zeng@xxxxxxxxx> > > > > > Cc: Christian König <christian.koenig@xxxxxxx>; Vetter, Daniel > > > > > <daniel.vetter@xxxxxxxxx>; Thomas Hellström > > > > > <thomas.hellstrom@xxxxxxxxxxxxxxx>; dri-devel@xxxxxxxxxxxxxxxxxxxxx; intel- > > > > > xe@xxxxxxxxxxxxxxxxxxxxx; robdclark@xxxxxxxxxxxx; airlied@xxxxxxxx; > > > > > lina@xxxxxxxxxxxxx; boris.brezillon@xxxxxxxxxxxxx; faith.ekstrand@xxxxxxxxxxxxx > > > > > Subject: Re: [RFC PATCH 00/10] Xe DRM scheduler and long running workload > > > > > plans > > > > > > > > > > On Wed, Apr 05, 2023 at 12:06:53PM -0600, Zeng, Oak wrote: > > > > > > Hi, > > > > > > > > > > > > Using dma-fence for completion/dependency tracking for long-run > > > > > workload(more precisely on-demand paging/page fault enabled workload) can > > > > > cause deadlock. This seems the significant issue here. Other issues such as the > > > > > drm scheduler completion order implication etc are minors which can be solve > > > > > inside the framework of drm scheduler. We need to evaluate below paths: > > > > > > 1) still use drm scheduler for job submission, and use dma-fence for job > > > > > completion waiting/dependency tracking. This is solution proposed in this series. > > > > > Annotate dma-fence for long-run workload: user can still wait dma-fence for job > > > > > completion but can't wait dma-fence while holding any memory management > > > > > locks. We still use dma-fence for dependency tracking. But it is just very easily > > > > > run into deadlock when on-demand paging is in the picture. The annotation helps > > > > > us to detect deadlock but not solve deadlock problems. Seems *not* a complete > > > > > solution: It is almost impossible to completely avoid dependency deadlock in > > > > > complex runtime environment > > > > > No one can wait on LR fence, so it is impossible to deadlock. The > > > > > annotations enforce this. Literally this is only for flow controling the > > > > > ring / hold pending jobs in in the DRM schedule list. > > > > > > > > > > > 2) Still use drm scheduler but not use dma-fence for completion signaling > > > > > and dependency tracking. This way we still get some free functions (reset, err > > > > > handling ring flow control as Matt said)from drm scheduler, but push the > > > > > dependency/completion tracking completely to user space using techniques such > > > > > as user space fence. User space doesn't have chance to wait fence while holding > > > > > a kernel memory management lock, thus the dma-fence deadlock issue is solved. > > > > > We use user space fence for syncs. > > > > > > > > > > > 3) Completely discard drm scheduler and dma-fence for long-run > > > > > workload. Use user queue/doorbell for super fast submission, directly interact > > > > > with fw scheduler. Use user fence for completion/dependency tracking. > > > > > This is a hard no from me, I want 1 submission path in Xe. Either we use > > > > > the DRM scheduler or we don't. > > > > > > > > > > Matt > > > > > > > > > > > Thanks, > > > > > > Oak > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Christian König <christian.koenig@xxxxxxx> > > > > > > > Sent: April 5, 2023 3:30 AM > > > > > > > To: Brost, Matthew <matthew.brost@xxxxxxxxx>; Zeng, Oak > > > > > > > <oak.zeng@xxxxxxxxx> > > > > > > > Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx; intel-xe@xxxxxxxxxxxxxxxxxxxxx; > > > > > > > robdclark@xxxxxxxxxxxx; thomas.hellstrom@xxxxxxxxxxxxxxx; > > > > > airlied@xxxxxxxx; > > > > > > > lina@xxxxxxxxxxxxx; boris.brezillon@xxxxxxxxxxxxx; > > > > > faith.ekstrand@xxxxxxxxxxxxx > > > > > > > Subject: Re: [RFC PATCH 00/10] Xe DRM scheduler and long running workload > > > > > > > plans > > > > > > > > > > > > > > Am 04.04.23 um 20:08 schrieb Matthew Brost: > > > > > > > > On Tue, Apr 04, 2023 at 12:02:03PM -0600, Zeng, Oak wrote: > > > > > > > > > Hi Matt, Thomas, > > > > > > > > > > > > > > > > > > Some very bold out of box thinking in this area: > > > > > > > > > > > > > > > > > > 1. so you want to use drm scheduler and dma-fence for long running > > > > > workload. > > > > > > > Why you want to do this in the first place? What is the benefit? Drm scheduler > > > > > is > > > > > > > pretty much a software scheduler. Modern gpu has scheduler built at fw/hw > > > > > > > level, as you said below for intel this is Guc. Can xe driver just directly submit > > > > > job > > > > > > > to Guc, bypassing drm scheduler? > > > > > > > > If we did that now we have 2 paths for dependency track, flow controling > > > > > > > > the ring, resets / error handling / backend submission implementations. > > > > > > > > We don't want this. > > > > > > > Well exactly that's the point: Why? > > > > > > > > > > > > > > As far as I can see that are two completely distinct use cases, so you > > > > > > > absolutely do want two completely distinct implementations for this. > > > > > > > > > > > > > > > > 2. using dma-fence for long run workload: I am well aware that page fault > > > > > (and > > > > > > > the consequent memory allocation/lock acquiring to fix the fault) can cause > > > > > > > deadlock for a dma-fence wait. But I am not convinced that dma-fence can't > > > > > be > > > > > > > used purely because the nature of the workload that it runs very long > > > > > (indefinite). > > > > > > > I did a math: the dma_fence_wait_timeout function's third param is the > > > > > timeout > > > > > > > which is a signed long type. If HZ is 1000, this is about 23 days. If 23 days is not > > > > > long > > > > > > > enough, can we just change the timeout parameter to signed 64 bits so it is > > > > > much > > > > > > > longer than our life time... > > > > > > > > > So I mainly argue we can't use dma-fence for long-run workload is not > > > > > > > because the workload runs very long, rather because of the fact that we use > > > > > > > page fault for long-run workload. If we enable page fault for short-run > > > > > workload, > > > > > > > we can't use dma-fence either. Page fault is the key thing here. > > > > > > > > > Now since we use page fault which is *fundamentally* controversial with > > > > > > > dma-fence design, why now just introduce a independent concept such as > > > > > user- > > > > > > > fence instead of extending existing dma-fence? > > > > > > > > > I like unified design. If drm scheduler, dma-fence can be extended to work > > > > > for > > > > > > > everything, it is beautiful. But seems we have some fundamental problem > > > > > here. > > > > > > > > Thomas's patches turn a dma-fence into KMD sync point (e.g. we just use > > > > > > > > the signal / CB infrastructure) and enforce we don't use use these > > > > > > > > dma-fences from the scheduler in memory reclaim paths or export these to > > > > > > > > user space or other drivers. Think of this mode as SW only fence. > > > > > > > Yeah and I truly think this is an really bad idea. > > > > > > > > > > > > > > The signal/CB infrastructure in the dma_fence turned out to be the > > > > > > > absolutely nightmare I initially predicted. Sorry to say that, but in > > > > > > > this case the "I've told you so" is appropriate in my opinion. > > > > > > > > > > > > > > If we need infrastructure for long running dependency tracking we should > > > > > > > encapsulate that in a new framework and not try to mangle the existing > > > > > > > code for something it was never intended for. > > > > > > > > > > > > > > Christian. > > > > > > > > > > > > > > > Matt > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Oak > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > From: dri-devel <dri-devel-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of > > > > > > > > > > Matthew Brost > > > > > > > > > > Sent: April 3, 2023 8:22 PM > > > > > > > > > > To: dri-devel@xxxxxxxxxxxxxxxxxxxxx; intel-xe@xxxxxxxxxxxxxxxxxxxxx > > > > > > > > > > Cc: robdclark@xxxxxxxxxxxx; thomas.hellstrom@xxxxxxxxxxxxxxx; > > > > > > > airlied@xxxxxxxx; > > > > > > > > > > lina@xxxxxxxxxxxxx; boris.brezillon@xxxxxxxxxxxxx; Brost, Matthew > > > > > > > > > > <matthew.brost@xxxxxxxxx>; christian.koenig@xxxxxxx; > > > > > > > > > > faith.ekstrand@xxxxxxxxxxxxx > > > > > > > > > > Subject: [RFC PATCH 00/10] Xe DRM scheduler and long running workload > > > > > > > plans > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > As a prerequisite to merging the new Intel Xe DRM driver [1] [2], we > > > > > > > > > > have been asked to merge our common DRM scheduler patches first as > > > > > well > > > > > > > > > > as develop a common solution for long running workloads with the DRM > > > > > > > > > > scheduler. This RFC series is our first attempt at doing this. We > > > > > > > > > > welcome any and all feedback. > > > > > > > > > > > > > > > > > > > > This can we thought of as 4 parts detailed below. > > > > > > > > > > > > > > > > > > > > - DRM scheduler changes for 1 to 1 relationship between scheduler and > > > > > > > > > > entity (patches 1-3) > > > > > > > > > > > > > > > > > > > > In Xe all of the scheduling of jobs is done by a firmware scheduler (the > > > > > > > > > > GuC) which is a new paradigm WRT to the DRM scheduler and presents > > > > > > > > > > severals problems as the DRM was originally designed to schedule jobs > > > > > on > > > > > > > > > > hardware queues. The main problem being that DRM scheduler expects > > > > > the > > > > > > > > > > submission order of jobs to be the completion order of jobs even across > > > > > > > > > > multiple entities. This assumption falls apart with a firmware scheduler > > > > > > > > > > as a firmware scheduler has no concept of jobs and jobs can complete > > > > > out > > > > > > > > > > of order. A novel solution for was originally thought of by Faith during > > > > > > > > > > the initial prototype of Xe, create a 1 to 1 relationship between scheduler > > > > > > > > > > and entity. I believe the AGX driver [3] is using this approach and > > > > > > > > > > Boris may use approach as well for the Mali driver [4]. > > > > > > > > > > > > > > > > > > > > To support a 1 to 1 relationship we move the main execution function > > > > > > > > > > from a kthread to a work queue and add a new scheduling mode which > > > > > > > > > > bypasses code in the DRM which isn't needed in a 1 to 1 relationship. > > > > > > > > > > The new scheduling mode should unify all drivers usage with a 1 to 1 > > > > > > > > > > relationship and can be thought of as using scheduler as a dependency / > > > > > > > > > > infligt job tracker rather than a true scheduler. > > > > > > > > > > > > > > > > > > > > - Generic messaging interface for DRM scheduler > > > > > > > > > > > > > > > > > > > > Idea is to be able to communicate to the submission backend with in > > > > > band > > > > > > > > > > (relative to main execution function) messages. Messages are backend > > > > > > > > > > defined and flexable enough for any use case. In Xe we use these > > > > > > > > > > messages to clean up entites, set properties for entites, and suspend / > > > > > > > > > > resume execution of an entity [5]. I suspect other driver can leverage > > > > > > > > > > this messaging concept too as it a convenient way to avoid races in the > > > > > > > > > > backend. > > > > > > > > > > > > > > > > > > > > - Support for using TDR for all error paths of a scheduler / entity > > > > > > > > > > > > > > > > > > > > Fix a few races / bugs, add function to dynamically set the TDR timeout. > > > > > > > > > > > > > > > > > > > > - Annotate dma-fences for long running workloads. > > > > > > > > > > > > > > > > > > > > The idea here is to use dma-fences only as sync points within the > > > > > > > > > > scheduler and never export them for long running workloads. By > > > > > > > > > > annotating these fences as long running we ensure that these dma- > > > > > fences > > > > > > > > > > are never used in a way that breaks the dma-fence rules. A benefit of > > > > > > > > > > thus approach is the scheduler can still safely flow control the > > > > > > > > > > execution ring buffer via the job limit without breaking the dma-fence > > > > > > > > > > rules. > > > > > > > > > > > > > > > > > > > > Again this a first draft and looking forward to feedback. > > > > > > > > > > > > > > > > > > > > Enjoy - Matt > > > > > > > > > > > > > > > > > > > > [1] https://gitlab.freedesktop.org/drm/xe/kernel > > > > > > > > > > [2] https://patchwork.freedesktop.org/series/112188/ > > > > > > > > > > [3] https://patchwork.freedesktop.org/series/114772/ > > > > > > > > > > [4] > > > > > https://patchwork.freedesktop.org/patch/515854/?series=112188&rev=1 > > > > > > > > > > [5] https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/drm-xe- > > > > > > > > > > next/drivers/gpu/drm/xe/xe_guc_submit.c#L1031 > > > > > > > > > > > > > > > > > > > > Matthew Brost (8): > > > > > > > > > > drm/sched: Convert drm scheduler to use a work queue rather than > > > > > > > > > > kthread > > > > > > > > > > drm/sched: Move schedule policy to scheduler / entity > > > > > > > > > > drm/sched: Add DRM_SCHED_POLICY_SINGLE_ENTITY scheduling > > > > > policy > > > > > > > > > > drm/sched: Add generic scheduler message interface > > > > > > > > > > drm/sched: Start run wq before TDR in drm_sched_start > > > > > > > > > > drm/sched: Submit job before starting TDR > > > > > > > > > > drm/sched: Add helper to set TDR timeout > > > > > > > > > > drm/syncobj: Warn on long running dma-fences > > > > > > > > > > > > > > > > > > > > Thomas Hellström (2): > > > > > > > > > > dma-buf/dma-fence: Introduce long-running completion fences > > > > > > > > > > drm/sched: Support long-running sched entities > > > > > > > > > > > > > > > > > > > > drivers/dma-buf/dma-fence.c | 142 +++++++--- > > > > > > > > > > drivers/dma-buf/dma-resv.c | 5 + > > > > > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 14 +- > > > > > > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +- > > > > > > > > > > drivers/gpu/drm/drm_syncobj.c | 5 +- > > > > > > > > > > drivers/gpu/drm/etnaviv/etnaviv_sched.c | 5 +- > > > > > > > > > > drivers/gpu/drm/lima/lima_sched.c | 5 +- > > > > > > > > > > drivers/gpu/drm/msm/adreno/adreno_device.c | 6 +- > > > > > > > > > > drivers/gpu/drm/msm/msm_ringbuffer.c | 5 +- > > > > > > > > > > drivers/gpu/drm/panfrost/panfrost_job.c | 5 +- > > > > > > > > > > drivers/gpu/drm/scheduler/sched_entity.c | 127 +++++++-- > > > > > > > > > > drivers/gpu/drm/scheduler/sched_fence.c | 6 +- > > > > > > > > > > drivers/gpu/drm/scheduler/sched_main.c | 278 +++++++++++++++-- > > > > > --- > > > > > > > > > > drivers/gpu/drm/v3d/v3d_sched.c | 25 +- > > > > > > > > > > include/drm/gpu_scheduler.h | 130 +++++++-- > > > > > > > > > > include/linux/dma-fence.h | 60 ++++- > > > > > > > > > > 16 files changed, 649 insertions(+), 184 deletions(-) > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > 2.34.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch