On Mon, 2024-12-30 at 16:52 +0000, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx> > > <tldr> > Replacing FIFO with a flavour of deadline driven scheduling and > removing round- > robin. Connecting the scheduler with dma-fence deadlines. First draft > and > testing by different drivers and feedback would be nice. I was only > able to test > it with amdgpu. Other drivers may not even compile. > </tldr> > > If I remember correctly Christian mentioned recently (give or take) > that maybe > round-robin could be removed. That got me thinking how and what could > be > improved and simplified. So I played a bit in the scheduler code and > came up > with something which appears to not crash at least. Whether or not > there are > significant advantages apart from maybe code consolidation and > reduction is the > main thing to be determined. > > One big question is whether round-robin can really be removed. Does > anyone use > it, rely on it, or what are even use cases where it is much better > than FIFO. > > See "drm/sched: Add deadline policy" commit message for a short > description on > what flavour of deadline scheduling it is. But in essence it should a > more fair > FIFO where higher priority can not forever starve lower priorities. > > "drm/sched: Connect with dma-fence deadlines" wires up dma-fence > deadlines to > the scheduler because it is easy and makes logical sense with this. > And I > noticed userspace already uses it so why not wire it up fully. > > Otherwise the series is a bit of progression from consolidating RR > into FIFO > code paths and going from there to deadline and then to a change in > how > dependencies are handled. And code simplification to 1:1 run queue to > scheduler > relationship, because deadline does not need per priority run queues. > > There is quite a bit of code to go throught here so I think it could > be even > better if other drivers could give it a spin as is and see if some > improvements > can be detected. Or at least no regressions. Soooo – I have thought about this series a bit more and also read a bit about the issues Michel recently mentioned. As Danilo also pointed out, going for an experiment like that at the current time is not a good idea. Not with the scheduler being in that shape still and not without having powerful tools for regression testing. That said, we are slowly moving into the right direction. I think one of the things we're lacking is good testing infrastructure. In fact, it's on my list for a while now to write kunit tests for the scheduler (beginning with the basics, submit a number of jobs and all that), so that we get a better mechanism for detecting regressions. Once we have more infrastructure for systematic testing, we could consequently also slowly become more open to looking into more daring changes. I unfortunately so far couldn't manage to free up some time to dedicate to that effort. In case you, Tvrtko, should have capacity for that, I of course wouldn't mind at all; that could help greatly Regards, Philipp > > Cc: Christian König <christian.koenig@xxxxxxx> > Cc: Danilo Krummrich <dakr@xxxxxxxxxx> > Cc: Matthew Brost <matthew.brost@xxxxxxxxx> > Cc: Philipp Stanner <pstanner@xxxxxxxxxx> > > Tvrtko Ursulin (14): > drm/sched: Delete unused update_job_credits > drm/sched: Remove idle entity from tree > drm/sched: Implement RR via FIFO > drm/sched: Consolidate entity run queue management > drm/sched: Move run queue related code into a separate file > drm/sched: Ignore own fence earlier > drm/sched: Resolve same scheduler dependencies earlier > drm/sched: Add deadline policy > drm/sched: Remove FIFO and RR and simplify to a single run queue > drm/sched: Queue all free credits in one worker invocation > drm/sched: Connect with dma-fence deadlines > drm/sched: Embed run queue singleton into the scheduler > dma-fence: Add helper for custom fence context when merging fences > drm/sched: Resolve all job dependencies in one go > > drivers/dma-buf/dma-fence-unwrap.c | 8 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 27 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 5 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 8 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 8 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 8 +- > drivers/gpu/drm/scheduler/Makefile | 2 +- > drivers/gpu/drm/scheduler/sched_entity.c | 316 ++++++----- > drivers/gpu/drm/scheduler/sched_fence.c | 5 +- > drivers/gpu/drm/scheduler/sched_main.c | 587 +++++------------- > -- > drivers/gpu/drm/scheduler/sched_rq.c | 199 +++++++ > include/drm/gpu_scheduler.h | 74 ++- > include/linux/dma-fence-unwrap.h | 31 +- > 14 files changed, 606 insertions(+), 678 deletions(-) > create mode 100644 drivers/gpu/drm/scheduler/sched_rq.c >