RE: [PATCH v7 4/4] drm/amdgpu: track bo memory stats at runtime

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Public]

> From: Koenig, Christian <Christian.Koenig@xxxxxxx>
> Sent: Wednesday, November 13, 2024 3:49
> Am 12.11.24 um 19:16 schrieb Li, Yunxiang (Teddy):
> >> From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx>
> >> Sent: Tuesday, November 12, 2024 5:54
> >> Am 10.11.24 um 16:41 schrieb Yunxiang Li:
> >>> @@ -2612,7 +2707,6 @@ void amdgpu_vm_fini(struct amdgpu_device
> >>> *adev, struct amdgpu_vm *vm)
> >>>
> >>>      root = amdgpu_bo_ref(vm->root.bo);
> >>>      amdgpu_bo_reserve(root, true);
> >>> -   amdgpu_vm_put_task_info(vm->task_info);
> >>>      amdgpu_vm_set_pasid(adev, vm, 0);
> >>>      dma_fence_wait(vm->last_unlocked, false);
> >>>      dma_fence_put(vm->last_unlocked); @@ -2660,6 +2754,15 @@ void
> >>> amdgpu_vm_fini(struct amdgpu_device *adev,
> >> struct amdgpu_vm *vm)
> >>>              }
> >>>      }
> >>>
> >>> +   if (!amdgpu_vm_stats_is_zero(vm)) {
> >>> +           struct amdgpu_task_info *ti = vm->task_info;
> >>> +
> >>> +           dev_warn(adev->dev,
> >>> +                    "VM memory stats for proc %s(%d) task %s(%d) is
> >>> + non-zero
> >> when fini\n",
> >>> +                    ti->process_name, ti->pid, ti->task_name, ti->tgid);
> >>> +   }
> >>> +
> >>> +   amdgpu_vm_put_task_info(vm->task_info);
> >> Please don't move the call to amdgpu_vm_put_task_info().
> > Is keeping the task_info alive a hazard here? I could copy out the info, it just
> seemed a bit wasteful.
>
> Ah, now I see why you have moved that.
>
> IIRC we need to free up the task info before releasing the PASID, but that info might
> be outdated. Need to check the code.
>
> Does it work if you move the message further up or does the root PD then break
> your neck because it isn't released yet?
>
> Thanks,
> Christian.

It needs to be after root BO is deleted. I think there's a way to go from pasid to task_info but not the other way around, so it should be safe? It's okay if the pasid/pid/etc gets recycled before we get here and we print outdated info since it's just so we know which application we should use to try to reproduce the bug.

Teddy




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux