The recent change brought a bug on SRIOV envrionment. It caused kernel crashing while unloading amdgpu on guest VM with hive configuration. The reason is that the hive->reset_domain is not used (hive->reset_domain is not initialized) for SRIOV, but the code did not check if hive->reset_domain before releasing. The hive->reset_domain need be checked not NULL before releasing. Fixed: d95e8e97e2d5 ("drm/amdgpu: refine create and release logic of hive info") Signed-off-by: Gavin Wan <Gavin.Wan@xxxxxxx> Change-Id: I17189e4d7357e399c6b70e43c24051356c025a3a --- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index 47159e9a0884..371c4f1aac2b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -217,8 +217,15 @@ static void amdgpu_xgmi_hive_release(struct kobject *kobj) struct amdgpu_hive_info *hive = container_of( kobj, struct amdgpu_hive_info, kobj); - amdgpu_reset_put_reset_domain(hive->reset_domain); - hive->reset_domain = NULL; + /** + * The hive->reset_domain is only initialized for none SRIOV + * configuration. It needs to check if hive->reset_domain is + * NULL. + */ + if (hive->reset_domain) { + amdgpu_reset_put_reset_domain(hive->reset_domain); + hive->reset_domain = NULL; + } mutex_destroy(&hive->hive_lock); kfree(hive); -- 2.34.1