On 14/02/2025 10:31, Christian König wrote:
Am 14.02.25 um 11:21 schrieb Tvrtko Ursulin:
Hi Christian,
On 11/02/2025 10:21, Christian König wrote:
Am 11.02.25 um 11:08 schrieb Philipp Stanner:
On Tue, 2025-02-11 at 09:22 +0100, Christian König wrote:
Am 06.02.25 um 17:40 schrieb Tvrtko Ursulin:
Replace a copy of DRM scheduler's to_drm_sched_job with a copy of a
newly
added __drm_sched_entity_queue_pop.
This allows breaking the hidden dependency that queue_node has to
be the
first element in struct drm_sched_job.
A comment is also added with a reference to the mailing list
discussion
explaining the copied helper will be removed when the whole broken
amdgpu_job_stop_all_jobs_on_sched is removed.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx>
Cc: Christian König <christian.koenig@xxxxxxx>
Cc: Danilo Krummrich <dakr@xxxxxxxxxx>
Cc: Matthew Brost <matthew.brost@xxxxxxxxx>
Cc: Philipp Stanner <phasta@xxxxxxxxxx>
Cc: "Zhang, Hawking" <Hawking.Zhang@xxxxxxx>
Reviewed-by: Christian König <christian.koenig@xxxxxxx>
I think this v3 has been supplanted by a v4 by now.
I've seen the larger v4 series as well, but at least that patch here looks identical on first glance. So my rb still counts.
Is it okay for you to merge the whole series (including this single amdgpu patch) via drm-misc?
I can do that, but don't want the scheduler maintainer want to pick them up?
Sorry that was some bad and unclear English. :(
It is as you suggest - what I meant was, is it okay from your point of
view that the whole series is merged via drm-misc? I assume Philipp
would indeed be the one to merge it, once all patches get r-b-ed.
Regards,
Tvrtko
@Tvrtko: btw, do you create patches with
git format-patch -v4 ?
That way the v4 label will be included in all patch titles, too, not
just the cover letter. That makes searching etc. easier in large
inboxes
P.
---
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 22 +++++++++++++++++++-
--
1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 100f04475943..22cb48bab24d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -411,8 +411,24 @@ static struct dma_fence *amdgpu_job_run(struct
drm_sched_job *sched_job)
return fence;
}
-#define to_drm_sched_job(sched_job) \
- container_of((sched_job), struct drm_sched_job,
queue_node)
+/*
+ * This is a duplicate function from DRM scheduler
sched_internal.h.
+ * Plan is to remove it when amdgpu_job_stop_all_jobs_on_sched is
removed, due
+ * latter being incorrect and racy.
+ *
+ * See
https://lore.kernel.org/amd-gfx/44edde63-7181-44fb- a4f7-94e50514f539@xxxxxxx/
+ */
+static struct drm_sched_job *
+__drm_sched_entity_queue_pop(struct drm_sched_entity *entity)
+{
+ struct spsc_node *node;
+
+ node = spsc_queue_pop(&entity->job_queue);
+ if (!node)
+ return NULL;
+
+ return container_of(node, struct drm_sched_job,
queue_node);
+}
void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler
*sched)
{
@@ -425,7 +441,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct
drm_gpu_scheduler *sched)
struct drm_sched_rq *rq = sched->sched_rq[i];
spin_lock(&rq->lock);
list_for_each_entry(s_entity, &rq->entities, list)
{
- while ((s_job =
to_drm_sched_job(spsc_queue_pop(&s_entity->job_queue)))) {
+ while ((s_job =
__drm_sched_entity_queue_pop(s_entity))) {
struct drm_sched_fence *s_fence =
s_job->s_fence;
dma_fence_signal(&s_fence-
scheduled);