On 11/02/2025 10:21, Christian König wrote:
Am 11.02.25 um 11:08 schrieb Philipp Stanner:
On Tue, 2025-02-11 at 09:22 +0100, Christian König wrote:
Am 06.02.25 um 17:40 schrieb Tvrtko Ursulin:
Replace a copy of DRM scheduler's to_drm_sched_job with a copy of a
newly
added __drm_sched_entity_queue_pop.
This allows breaking the hidden dependency that queue_node has to
be the
first element in struct drm_sched_job.
A comment is also added with a reference to the mailing list
discussion
explaining the copied helper will be removed when the whole broken
amdgpu_job_stop_all_jobs_on_sched is removed.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx>
Cc: Christian König <christian.koenig@xxxxxxx>
Cc: Danilo Krummrich <dakr@xxxxxxxxxx>
Cc: Matthew Brost <matthew.brost@xxxxxxxxx>
Cc: Philipp Stanner <phasta@xxxxxxxxxx>
Cc: "Zhang, Hawking" <Hawking.Zhang@xxxxxxx>
Reviewed-by: Christian König <christian.koenig@xxxxxxx>
I think this v3 has been supplanted by a v4 by now.
I've seen the larger v4 series as well, but at least that patch here
looks identical on first glance. So my rb still counts.
Effectively identical - I only removed the double underscore in v4 on a
(good) suggestion from Danilo.
@Tvrtko: btw, do you create patches with
git format-patch -v4 ?
That way the v4 label will be included in all patch titles, too, not
just the cover letter. That makes searching etc. easier in large
inboxes
I don't typically use -vN for the whole series since not all patches in
the series change equally and I assume people use threaded views so vN
in the root is sufficient.
Regards,
Tvrtko
---
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 22 +++++++++++++++++++-
--
1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 100f04475943..22cb48bab24d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -411,8 +411,24 @@ static struct dma_fence *amdgpu_job_run(struct
drm_sched_job *sched_job)
return fence;
}
-#define to_drm_sched_job(sched_job) \
- container_of((sched_job), struct drm_sched_job,
queue_node)
+/*
+ * This is a duplicate function from DRM scheduler
sched_internal.h.
+ * Plan is to remove it when amdgpu_job_stop_all_jobs_on_sched is
removed, due
+ * latter being incorrect and racy.
+ *
+ * See
https://lore.kernel.org/amd-gfx/44edde63-7181-44fb-
a4f7-94e50514f539@xxxxxxx/
+ */
+static struct drm_sched_job *
+__drm_sched_entity_queue_pop(struct drm_sched_entity *entity)
+{
+ struct spsc_node *node;
+
+ node = spsc_queue_pop(&entity->job_queue);
+ if (!node)
+ return NULL;
+
+ return container_of(node, struct drm_sched_job,
queue_node);
+}
void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler
*sched)
{
@@ -425,7 +441,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct
drm_gpu_scheduler *sched)
struct drm_sched_rq *rq = sched->sched_rq[i];
spin_lock(&rq->lock);
list_for_each_entry(s_entity, &rq->entities, list)
{
- while ((s_job =
to_drm_sched_job(spsc_queue_pop(&s_entity->job_queue)))) {
+ while ((s_job =
__drm_sched_entity_queue_pop(s_entity))) {
struct drm_sched_fence *s_fence =
s_job->s_fence;
dma_fence_signal(&s_fence-
scheduled);