Re: [PATCH 2/3] drm/amdgpu: Pop jobs from the queue more robustly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/02/2025 10:21, Christian König wrote:
Am 11.02.25 um 11:08 schrieb Philipp Stanner:
On Tue, 2025-02-11 at 09:22 +0100, Christian König wrote:
Am 06.02.25 um 17:40 schrieb Tvrtko Ursulin:
Replace a copy of DRM scheduler's to_drm_sched_job with a copy of a
newly
added __drm_sched_entity_queue_pop.

This allows breaking the hidden dependency that queue_node has to
be the
first element in struct drm_sched_job.

A comment is also added with a reference to the mailing list
discussion
explaining the copied helper will be removed when the whole broken
amdgpu_job_stop_all_jobs_on_sched is removed.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx>
Cc: Christian König <christian.koenig@xxxxxxx>
Cc: Danilo Krummrich <dakr@xxxxxxxxxx>
Cc: Matthew Brost <matthew.brost@xxxxxxxxx>
Cc: Philipp Stanner <phasta@xxxxxxxxxx>
Cc: "Zhang, Hawking" <Hawking.Zhang@xxxxxxx>
Reviewed-by: Christian König <christian.koenig@xxxxxxx>
I think this v3 has been supplanted by a v4 by now.

I've seen the larger v4 series as well, but at least that patch here looks identical on first glance. So my rb still counts.

Effectively identical - I only removed the double underscore in v4 on a (good) suggestion from Danilo.

@Tvrtko: btw, do you create patches with
git format-patch -v4 ?

That way the v4 label will be included in all patch titles, too, not
just the cover letter. That makes searching etc. easier in large
inboxes

I don't typically use -vN for the whole series since not all patches in the series change equally and I assume people use threaded views so vN in the root is sufficient.

Regards,

Tvrtko

---
   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 22 +++++++++++++++++++-
--
   1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 100f04475943..22cb48bab24d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -411,8 +411,24 @@ static struct dma_fence *amdgpu_job_run(struct
drm_sched_job *sched_job)
       return fence;
   }
-#define to_drm_sched_job(sched_job)        \
-        container_of((sched_job), struct drm_sched_job,
queue_node)
+/*
+ * This is a duplicate function from DRM scheduler
sched_internal.h.
+ * Plan is to remove it when amdgpu_job_stop_all_jobs_on_sched is
removed, due
+ * latter being incorrect and racy.
+ *
+ * See
https://lore.kernel.org/amd-gfx/44edde63-7181-44fb- a4f7-94e50514f539@xxxxxxx/
+ */
+static struct drm_sched_job *
+__drm_sched_entity_queue_pop(struct drm_sched_entity *entity)
+{
+    struct spsc_node *node;
+
+    node = spsc_queue_pop(&entity->job_queue);
+    if (!node)
+        return NULL;
+
+    return container_of(node, struct drm_sched_job,
queue_node);
+}
   void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler
*sched)
   {
@@ -425,7 +441,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct
drm_gpu_scheduler *sched)
           struct drm_sched_rq *rq = sched->sched_rq[i];
           spin_lock(&rq->lock);
           list_for_each_entry(s_entity, &rq->entities, list)
{
-            while ((s_job =
to_drm_sched_job(spsc_queue_pop(&s_entity->job_queue)))) {
+            while ((s_job =
__drm_sched_entity_queue_pop(s_entity))) {
                   struct drm_sched_fence *s_fence =
s_job->s_fence;
                   dma_fence_signal(&s_fence-
scheduled);





[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux