Re: [PATCH] drm/amdgpu: Check hive->reset_domain not NULL before releasing it.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2022-11-01 14:49, Gavin Wan wrote:
The recent change brought a bug on SRIOV envrionment. It caused
kernel crashing while unloading amdgpu on guest VM with hive
configuration. The reason is that the hive->reset_domain is not
used (hive->reset_domain is not initialized) for SRIOV, but the
code did not check if hive->reset_domain before releasing.

The hive->reset_domain need be checked not NULL before releasing.

Fixed: d95e8e97e2d5 ("drm/amdgpu: refine create and release logic of hive info")

The tag should be named "Fixes", not "Fixed".


Signed-off-by: Gavin Wan <Gavin.Wan@xxxxxxx>
Change-Id: I17189e4d7357e399c6b70e43c24051356c025a3a

Please remove the Change-Id.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 11 +++++++++--
  1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 47159e9a0884..371c4f1aac2b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -217,8 +217,15 @@ static void amdgpu_xgmi_hive_release(struct kobject *kobj)
  	struct amdgpu_hive_info *hive = container_of(
  		kobj, struct amdgpu_hive_info, kobj);
- amdgpu_reset_put_reset_domain(hive->reset_domain);
-	hive->reset_domain = NULL;
+	/**

Remove the extra *. /** is used to denote doc-comments, and this is not one.


+	 * The hive->reset_domain is only initialized for none SRIOV
+	 * configuration. It needs to check if hive->reset_domain is
+	 * NULL.
+	 */
+	if (hive->reset_domain) {
+		amdgpu_reset_put_reset_domain(hive->reset_domain);

It may be better to do the NULL pointer check inside amdgpu_reset_put_reset_domain. In fact, current staging already has a check there, so this patch is unnecessary. Just sync your branch. It was added by this commit:

commit d6a7ab1e0168a96b6cb0e386399e54af4fe39af4
Author: Vignesh Chander <Vignesh.Chander@xxxxxxx>
Date:   Wed Sep 28 14:59:45 2022 -0400

    drm/amdgpu: Skip put_reset_domain if it doesn't exist

    For xgmi sriov, the reset is handled by host driver and hive->reset_domain
    is not initialized so need to check if it exists before doing a put.
    Signed-off-by: Vignesh Chander <Vignesh.Chander@xxxxxxx>
    Reviewed-by: Shaoyun Liu <Shaoyun.Liu@xxxxxxx>

    Regards,
  Felix


+		hive->reset_domain = NULL;
+	}
mutex_destroy(&hive->hive_lock);
  	kfree(hive);



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux