On Tue, Apr 04, 2023 at 10:07:48AM +0900, Asahi Lina wrote: > Hi, thanks for the Cc! > No problem. > On 04/04/2023 09.22, Matthew Brost wrote: > > Hello, > > > > As a prerequisite to merging the new Intel Xe DRM driver [1] [2], we > > have been asked to merge our common DRM scheduler patches first as well > > as develop a common solution for long running workloads with the DRM > > scheduler. This RFC series is our first attempt at doing this. We > > welcome any and all feedback. > > > > This can we thought of as 4 parts detailed below. > > > > - DRM scheduler changes for 1 to 1 relationship between scheduler and > > entity (patches 1-3) > > > > In Xe all of the scheduling of jobs is done by a firmware scheduler (the > > GuC) which is a new paradigm WRT to the DRM scheduler and presents > > severals problems as the DRM was originally designed to schedule jobs on > > hardware queues. The main problem being that DRM scheduler expects the > > submission order of jobs to be the completion order of jobs even across > > multiple entities. This assumption falls apart with a firmware scheduler > > as a firmware scheduler has no concept of jobs and jobs can complete out > > of order. A novel solution for was originally thought of by Faith during > > the initial prototype of Xe, create a 1 to 1 relationship between scheduler > > and entity. I believe the AGX driver [3] is using this approach and > > Boris may use approach as well for the Mali driver [4]. > > > > To support a 1 to 1 relationship we move the main execution function > > from a kthread to a work queue and add a new scheduling mode which > > bypasses code in the DRM which isn't needed in a 1 to 1 relationship. > > The new scheduling mode should unify all drivers usage with a 1 to 1 > > relationship and can be thought of as using scheduler as a dependency / > > infligt job tracker rather than a true scheduler. > > Yup, we're in the exact same situation with drm/asahi, so this is very > welcome! We've been using the existing scheduler as-is, but this should help > remove some unneeded complexity in this use case. > That's the idea. > Do you want me to pull in this series into our tree and make sure this all > works out for us? > We tested this in Xe and it definitely works for us but the more testing the better. > I also have a couple bugfixes for drm/sched I need to send out, but I think > the rebase/merge with this series should be trivial. I'll send that out this > week. > > > - Generic messaging interface for DRM scheduler > > > > Idea is to be able to communicate to the submission backend with in band > > (relative to main execution function) messages. Messages are backend > > defined and flexable enough for any use case. In Xe we use these > > messages to clean up entites, set properties for entites, and suspend / > > resume execution of an entity [5]. I suspect other driver can leverage > > this messaging concept too as it a convenient way to avoid races in the > > backend. > > We haven't needed this so far (mostly by using fine-grained locking and > refcounting all over the place) but I can see it being useful to simplify > some of those constructs and maybe avoid potential deadlocks in some places. > I'm not sure yet whether we can fully get rid of the main queue > refcounting/locking (our completion/error signaling path doesn't map well to > DMA fences directly so we still need something there to get from the global > GPU completion signaling thread to individual queues) but it might be a step > in the right direction at least! > With this messaging interface we essentially have a lockless submission backend which is really nice compared to what we did in the i915. Matt > ~~ Lina >