Re: [PATCH] drm/amdgpu: Fix a NULL pointer of fence

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 2022-07-07 um 05:54 schrieb Christian König:
Am 07.07.22 um 11:50 schrieb xinhui pan:
Fence is accessed by dma_resv_add_fence() now.
Use amdgpu_amdkfd_remove_eviction_fence instead.

Signed-off-by: xinhui pan <xinhui.pan@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 0036c9e405af..1e25c400ce4f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1558,10 +1558,10 @@ void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
        if (!process_info)
          return;
-
      /* Release eviction fence from PD */
      amdgpu_bo_reserve(pd, false);
-    amdgpu_bo_fence(pd, NULL, false);
+    amdgpu_amdkfd_remove_eviction_fence(pd,
+                    process_info->eviction_fence);

Good catch as well, but Felix needs to take a look at this.

This is weird. We used amdgpu_bo_fence(pd, NULL, false) here, which would have removed an exclusive fence. But as far as I can tell we added the fence as a shared fence in init_kfd_vm and amdgpu_amdkfd_gpuvm_restore_process_bos. So this probably never worked as intended.

You could try if this is really needed. Just remove the eviction fence removal. Then enable eviction debugging with

    echo Y > /sys/module/amdgpu/parameters/debug_evictions

Run some simple tests and check the kernel log to see if process termination is causing any unexpected evictions.

Regards,
  Felix



Regards,
Christian.

      amdgpu_bo_unreserve(pd);
        /* Update process info */




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux