From: Matthew Brost <matthew.brost@xxxxxxxxx> No DRM scheduler changes required, drivers just return NULL in run_job vfunc. Signed-off-by: Matthew Brost <matthew.brost@xxxxxxxxx> --- Christian, Alex, Danilo, Lina, and others I'd like to kindly ask your attention and probably an ack from you here. Based on [1] and other discussions that we had around this long-running jobs, it looks like a rough consensus that we don't need nor want a change in the drm-scheduler itself. And that the solution is that for these kind of jobs in the drm-scheduler the only thing needed is really this NULL return in the run_job v func. Like Matt had written in the commit message above. Looking to the AMD code and talking to Alex, I learned that AMD uses user queue for direct submission for compute cases and there's currently no plan to use drm-scheduler for that. Then, the only thing that looked similar on the AMD code was the eviction_fence that looks similar to Xe's preempt_fence, but both using dma_fences directly and it doesn't look worth (or good) to introduce a middle layer for that there. I don't know the plan for the other drivers, but it looks like the solution currently in use by Xe should be enough. Do you agree with the above? Or do you believe some work around drm-scheduler is needed for the long-running workloads? If no further work is needed/desired, I'd like to move this long-running TODO item to the 'closed' section below. Could you please help me to confirm this or either describe what I am possibly missing in here? Thanks in advance, Rodrigo. [1] https://lore.kernel.org/all/20230404002211.3611376-9-matthew.brost@xxxxxxxxx/ Documentation/gpu/rfc/xe.rst | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/Documentation/gpu/rfc/xe.rst b/Documentation/gpu/rfc/xe.rst index b67f8e6a1825..1e1dd6202438 100644 --- a/Documentation/gpu/rfc/xe.rst +++ b/Documentation/gpu/rfc/xe.rst @@ -127,21 +127,6 @@ Some parts of userptr like mmu_notifiers should become GPUVA or DRM helpers when the second driver supporting VM_BIND+userptr appears. Details to be defined when the time comes. -Long running compute: minimal data structure/scaffolding --------------------------------------------------------- -The generic scheduler code needs to include the handling of endless compute -contexts, with the minimal scaffolding for preempt-ctx fences (probably on the -drm_sched_entity) and making sure drm_scheduler can cope with the lack of job -completion fence. - -The goal is to achieve a consensus ahead of Xe initial pull-request, ideally with -this minimal drm/scheduler work, if needed, merged to drm-misc in a way that any -drm driver, including Xe, could re-use and add their own individual needs on top -in a next stage. However, this should not block the initial merge. - -This is a non-blocker item since the driver without the support for the long -running compute enabled is not a showstopper. - Display integration with i915 ----------------------------- In order to share the display code with the i915 driver so that there is maximum @@ -230,3 +215,15 @@ As a key measurable result, Xe needs to be aligned with the GPU VA and working i our tree. Missing Nouveau patches should *not* block Xe and any needed GPUVA related patch should be independent and present on dri-devel or acked by maintainers to go along with the first Xe pull request towards drm-next. + +Long running compute: minimal data structure/scaffolding +-------------------------------------------------------- +The generic scheduler code needs to include the handling of endless compute +contexts, with the minimal scaffolding for preempt-ctx fences (probably on the +drm_sched_entity) and making sure drm_scheduler can cope with the lack of job +completion fence. + +The goal is to achieve a consensus ahead of Xe initial pull-request, ideally with +this minimal drm/scheduler work, if needed, merged to drm-misc in a way that any +drm driver, including Xe, could re-use and add their own individual needs on top +in a next stage. However, this should not block the initial merge. -- 2.41.0