Re: [PATCH] drm/amdgpu: fix reset domain xgmi hive info reference leak

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 2022-08-11 um 09:42 schrieb Jonathan Kim:
When an xgmi node is added to the hive, it takes another hive
reference for its reset domain.

This extra reference was not dropped on device removal from the
hive so drop it.

Signed-off-by: Jonathan Kim <jonathan.kim@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 1b108d03e785..560bf1c98f08 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -731,6 +731,9 @@ int amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
  	mutex_unlock(&hive->hive_lock);
amdgpu_put_xgmi_hive(hive);
+	/* device is removed from the hive so remove its reset domain reference */
+	if (adev->reset_domain && adev->reset_domain == hive->reset_domain)
+		amdgpu_put_xgmi_hive(hive);

This is some messed up reference counting. If you need an extra reference from the reset_domain to the hive, that should be owned by the reset_domain and dropped when the reset_domain is destroyed. And it's only one reference for the reset_domain, not one reference per adev in the reset_domain.

What you're doing here looks like every adev that's in a reset_domain of its hive has two references to the hive. And if you're dropping the extra reference here, it still leaves the reset_domain with a dangling pointer to a hive that may no longer exist. So this extra reference is kind of pointless.

Regards,
  Felix


  	adev->hive = NULL;
if (atomic_dec_return(&hive->number_devices) == 0) {



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux