Re: [PATCH] drm/amdgpu: Initialize schedulers before using them

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





Am 23.10.23 um 05:23 schrieb Luben Tuikov:
Initialize ring schedulers before using them, very early in the amdgpu boot,
at PCI probe time, specifically at frame-buffer dumb-create at fill-buffer.

This was discovered by using dynamic scheduler run-queues, which showed that
amdgpu was using a scheduler before calling drm_sched_init(), and the only
reason it was working was because sched_rq[] was statically allocated in the
scheduler structure. However, the scheduler structure had _not_ been
initialized.

When switching to dynamically allocated run-queues, this lack of
initialization was causing an oops and a blank screen at boot up. This patch
fixes this amdgpu bug.

This patch depends on the "drm/sched: Convert the GPU scheduler to variable
number of run-queues" patch, as that patch prevents subsequent scheduler
initialization if a scheduler has already been initialized.

Cc: Christian König <christian.koenig@xxxxxxx>
Cc: Alex Deucher <Alexander.Deucher@xxxxxxx>
Cc: Felix Kuehling <Felix.Kuehling@xxxxxxx>
Cc: AMD Graphics <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Luben Tuikov <luben.tuikov@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 14 ++++++++++++++
  1 file changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 4e51dce3aab5d6..575ef7e1e30fd4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -60,6 +60,7 @@
  #include "amdgpu_atomfirmware.h"
  #include "amdgpu_res_cursor.h"
  #include "bif/bif_4_1_d.h"
+#include "amdgpu_reset.h"
MODULE_IMPORT_NS(DMA_BUF); @@ -2059,6 +2060,19 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) ring = adev->mman.buffer_funcs_ring;
  		sched = &ring->sched;
+
+		r = drm_sched_init(sched, &amdgpu_sched_ops,
+				   DRM_SCHED_PRIORITY_COUNT,
+				   ring->num_hw_submission, 0,
+				   adev->sdma_timeout, adev->reset_domain->wq,
+				   ring->sched_score, ring->name,
+				   adev->dev);
+		if (r) {
+			drm_err(adev, "%s: couldn't initialize ring:%s error:%d\n",
+				__func__, ring->name, r);
+			return;
+		}

That doesn't look correct either.

amdgpu_ttm_set_buffer_funcs_status() should only be called with enable=true as argument *after* the copy ring is initialized and valid to use. One part of this ring initialization is to setup the scheduler.


+
  		r = drm_sched_entity_init(&adev->mman.high_pr,
  					  DRM_SCHED_PRIORITY_KERNEL, &sched,
  					  1, NULL);

That here looks totally incorrect and misplaced to me. amdgpu_ttm_set_buffer_funcs_status() should only enabled/disable using the copy functions and not really initialize the entity.

So the entity should only be created when enable=true and it should especially *not* re-created all the time without properly destroying it.

Can you look at the history of the code? I'm pretty sure that this was at some time correctly implemented.

Thanks,
Christian.


base-commit: 05d3ef8bba77c1b5f98d941d8b2d4aeab8118ef1
prerequisite-patch-id: c52673df9b6fc9ee001d6261c7ac107b618912a0




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux