Hi Christian,
On 11/02/2025 10:21, Christian König wrote:
Am 11.02.25 um 11:08 schrieb Philipp Stanner:
On Tue, 2025-02-11 at 09:22 +0100, Christian König wrote:
Am 06.02.25 um 17:40 schrieb Tvrtko Ursulin:
Replace a copy of DRM scheduler's to_drm_sched_job with a copy of a
newly
added __drm_sched_entity_queue_pop.
This allows breaking the hidden dependency that queue_node has to
be the
first element in struct drm_sched_job.
A comment is also added with a reference to the mailing list
discussion
explaining the copied helper will be removed when the whole broken
amdgpu_job_stop_all_jobs_on_sched is removed.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx>
Cc: Christian König <christian.koenig@xxxxxxx>
Cc: Danilo Krummrich <dakr@xxxxxxxxxx>
Cc: Matthew Brost <matthew.brost@xxxxxxxxx>
Cc: Philipp Stanner <phasta@xxxxxxxxxx>
Cc: "Zhang, Hawking" <Hawking.Zhang@xxxxxxx>
Reviewed-by: Christian König <christian.koenig@xxxxxxx>
I think this v3 has been supplanted by a v4 by now.
I've seen the larger v4 series as well, but at least that patch here
looks identical on first glance. So my rb still counts.
Is it okay for you to merge the whole series (including this single
amdgpu patch) via drm-misc?
Regards,
Tvrtko
@Tvrtko: btw, do you create patches with
git format-patch -v4 ?
That way the v4 label will be included in all patch titles, too, not
just the cover letter. That makes searching etc. easier in large
inboxes
P.
---
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 22 +++++++++++++++++++-
--
1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 100f04475943..22cb48bab24d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -411,8 +411,24 @@ static struct dma_fence *amdgpu_job_run(struct
drm_sched_job *sched_job)
return fence;
}
-#define to_drm_sched_job(sched_job) \
- container_of((sched_job), struct drm_sched_job,
queue_node)
+/*
+ * This is a duplicate function from DRM scheduler
sched_internal.h.
+ * Plan is to remove it when amdgpu_job_stop_all_jobs_on_sched is
removed, due
+ * latter being incorrect and racy.
+ *
+ * See
https://lore.kernel.org/amd-gfx/44edde63-7181-44fb-
a4f7-94e50514f539@xxxxxxx/
+ */
+static struct drm_sched_job *
+__drm_sched_entity_queue_pop(struct drm_sched_entity *entity)
+{
+ struct spsc_node *node;
+
+ node = spsc_queue_pop(&entity->job_queue);
+ if (!node)
+ return NULL;
+
+ return container_of(node, struct drm_sched_job,
queue_node);
+}
void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler
*sched)
{
@@ -425,7 +441,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct
drm_gpu_scheduler *sched)
struct drm_sched_rq *rq = sched->sched_rq[i];
spin_lock(&rq->lock);
list_for_each_entry(s_entity, &rq->entities, list)
{
- while ((s_job =
to_drm_sched_job(spsc_queue_pop(&s_entity->job_queue)))) {
+ while ((s_job =
__drm_sched_entity_queue_pop(s_entity))) {
struct drm_sched_fence *s_fence =
s_job->s_fence;
dma_fence_signal(&s_fence-
scheduled);