Re: [PATCH v7 4/4] drm/amdgpu: track bo memory stats at runtime

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 13.11.24 um 15:09 schrieb Li, Yunxiang (Teddy):
[Public]

From: Koenig, Christian <Christian.Koenig@xxxxxxx>
Sent: Wednesday, November 13, 2024 3:49
Am 12.11.24 um 19:16 schrieb Li, Yunxiang (Teddy):
From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx>
Sent: Tuesday, November 12, 2024 5:54
Am 10.11.24 um 16:41 schrieb Yunxiang Li:
@@ -2612,7 +2707,6 @@ void amdgpu_vm_fini(struct amdgpu_device
*adev, struct amdgpu_vm *vm)

      root = amdgpu_bo_ref(vm->root.bo);
      amdgpu_bo_reserve(root, true);
-   amdgpu_vm_put_task_info(vm->task_info);
      amdgpu_vm_set_pasid(adev, vm, 0);
      dma_fence_wait(vm->last_unlocked, false);
      dma_fence_put(vm->last_unlocked); @@ -2660,6 +2754,15 @@ void
amdgpu_vm_fini(struct amdgpu_device *adev,
struct amdgpu_vm *vm)
              }
      }

+   if (!amdgpu_vm_stats_is_zero(vm)) {
+           struct amdgpu_task_info *ti = vm->task_info;
+
+           dev_warn(adev->dev,
+                    "VM memory stats for proc %s(%d) task %s(%d) is
+ non-zero
when fini\n",
+                    ti->process_name, ti->pid, ti->task_name, ti->tgid);
+   }
+
+   amdgpu_vm_put_task_info(vm->task_info);
Please don't move the call to amdgpu_vm_put_task_info().
Is keeping the task_info alive a hazard here? I could copy out the info, it just
seemed a bit wasteful.

Ah, now I see why you have moved that.

IIRC we need to free up the task info before releasing the PASID, but that info might
be outdated. Need to check the code.

Does it work if you move the message further up or does the root PD then break
your neck because it isn't released yet?

Thanks,
Christian.
It needs to be after root BO is deleted. I think there's a way to go from pasid to task_info but not the other way around, so it should be safe? It's okay if the pasid/pid/etc gets recycled before we get here and we print outdated info since it's just so we know which application we should use to try to reproduce the bug.

Double checked that and the order is actually incorrect right now and your patch here is fixing it!

The call to amdgpu_vm_put_task_info() needs to be after the call to amdgpu_vm_set_pasid(adev, vm, 0) or otherwise fault handling might use a freed up task info.

So feel free to completely ignore my comment, you're actually fixing things.

Regards,
Christian.



Teddy




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux