Re: [v4 5/5] drm/amdgpu: fix invalid memory access in amdgpu_fence_driver_sw_fini()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 10.01.25 um 08:37 schrieb Gerry Liu:
2025年1月10日 14:51,Christian König <christian.koenig@xxxxxxx> 写道:

Am 10.01.25 um 03:08 schrieb Jiang Liu:
Function detects initialization status by checking sched->ops, so set
sched->ops to non-NULL just before return in function
amdgpu_fence_driver_sw_fini() and amdgpu_device_init_schedulers()
to avoid possible invalid memory access on error recover path.

Signed-off-by: Jiang Liu <gerry@xxxxxxxxxxxxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 4 +++-
  2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f29a4ad3c6d0..3688e9eb949b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2857,6 +2857,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
  		if (r) {
  			DRM_ERROR("Failed to create scheduler on ring %s.\n",
  				  ring->name);
+			ring->sched.ops = NULL;
  			return r;
  		}
  		r = amdgpu_uvd_entity_init(adev, ring);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 2f24a6aa13bf..b5e87b515139 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -656,8 +656,10 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev)
  		 * The natural check would be sched.ready, which is
  		 * set as drm_sched_init() finishes...
  		 */
-		if (ring->sched.ops)
+		if (ring->sched.ops) {
  			drm_sched_fini(&ring->sched);
+			ring->sched.ops = NULL;
+		}
As said in my previous reply we should really stop checking sched.ops here.

The scheduler should not be cleaned up by this function in the first place.
Hi Christian,
	The current workflow is as below:
amdgpu_device_init
         amdgpu_fence_driver_sw_init
         amdgpu_device_ip_init
                 amdgpu_device_init_schedulers
                         drm_sched_init
         amdgpu_fence_driver_hw_init
amdgpu_device_fini_hw
         amdgpu_fence_driver_hw_fini
amdgpu_device_fini_sw
         amdgpu_device_ip_fini
         amdgpu_fence_driver_sw_fini
                 drm_sched_fini

	As you have mentioned, we should introduce amdgpu_device_fini_schedulers() and gets it called by amdgpu_device_ip_fini().
But I have no idea about why currently drm_sched_fini() is called by amdgpu_fence_driver_sw_fini() and whether it’s safe to move it into amdgpu_device_ip_fini().

If I remember correctly we just put the drm_sched_fini() with the ops check as cheap way of cleaning up all schedulers before we had the ring->no_scheduler flag which indicates if the scheduler is initialized or not in the first place.

So it should be save to move the drm_sched_fini() into amdgpu_device_ip_fini() as well. Just check ring->no_scheduler instead of ring->sched.ops to decide if the scheduler needs to be cleaned up or not.

Regards,
Christian.


Thanks,
Gerry

Regards,
Christian.

    		for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
  			dma_fence_put(ring->fence_drv.fences[j]);




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux