[Public] My BAD to misunderstand this. There are both spell typos in patch subject and body, s/iff/if. The patch is: Reviewed-by: Guchun Chen <guchun.chen@xxxxxxx> Please wait for the ack from Andrey and Christian before pushing this. Regards, Guchun -----Original Message----- From: Shi, Leslie <Yuliang.Shi@xxxxxxx> Sent: Thursday, December 16, 2021 3:00 PM To: Chen, Guchun <Guchun.Chen@xxxxxxx>; Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Subject: RE: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is unplugged to prevent crash in GPU initialization failure [Public] Hi Guchun, As Andrey says, "we should not call amdgpu_device_unmap_mmio unless device is unplugged", I think we should call amdgpu_device_unmap_mmio() only if device is unplugged (drm_dev_enter() return false) . +if (!drm_dev_enter(adev_to_drm(adev), &idx)) + amdgpu_device_unmap_mmio(adev); + else # drm_dev_exit(idx); Regards, Leslie -----Original Message----- From: Chen, Guchun <Guchun.Chen@xxxxxxx> Sent: Thursday, December 16, 2021 2:46 PM To: Shi, Leslie <Yuliang.Shi@xxxxxxx>; Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Subject: RE: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is unplugged to prevent crash in GPU initialization failure [Public] Hi Leslie, I think we need to modify it like: +if (drm_dev_enter(adev_to_drm(adev), &idx)) { + amdgpu_device_unmap_mmio(adev); + drm_dev_exit(idx); +} Also you need to credit Andrey a 'suggested-by' in your patch. Regards, Guchun -----Original Message----- From: Shi, Leslie <Yuliang.Shi@xxxxxxx> Sent: Thursday, December 16, 2021 2:14 PM To: Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Chen, Guchun <Guchun.Chen@xxxxxxx>; Shi, Leslie <Yuliang.Shi@xxxxxxx> Subject: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is unplugged to prevent crash in GPU initialization failure [Why] In amdgpu_driver_load_kms, when amdgpu_device_init returns error during driver modprobe, it will start the error handle path immediately and call into amdgpu_device_unmap_mmio as well to release mapped VRAM. However, in the following release callback, driver stills visits the unmapped memory like vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a kernel crash occurs. [How] call amdgpu_device_unmap_mmio() iff device is unplugged to prevent invalid memory address in vcn_v3_0_sw_fini() when GPU initialization failure. Signed-off-by: Leslie Shi <Yuliang.Shi@xxxxxxx> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index fb03d75880ec..d3656e7b60c2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3845,6 +3845,8 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev) */ void amdgpu_device_fini_hw(struct amdgpu_device *adev) { + int idx; + dev_info(adev->dev, "amdgpu: finishing device.\n"); flush_delayed_work(&adev->delayed_init_work); if (adev->mman.initialized) { @@ -3888,7 +3890,11 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) amdgpu_gart_dummy_page_fini(adev); - amdgpu_device_unmap_mmio(adev); + if (!drm_dev_enter(adev_to_drm(adev), &idx)) + amdgpu_device_unmap_mmio(adev); + else + drm_dev_exit(idx); + } void amdgpu_device_fini_sw(struct amdgpu_device *adev) -- 2.25.1