Re: [PATCH v3] drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Adding back public list and Leslie specifically.

Lijo is right and it's not MTTR release only, also all the unmaps in amdgpu_device_unmap_mmio since this patch makes call to amdgpu_device_unmap_mmio conditioned on device unplugged we need then to still take care of unmapping even when device is NOT unplugged - for this i suggest to look at 'drm/amdgpu: Unmap all MMIO mappings' and just conditionally call all the deleted unmaps
in the patch where the condition is 'if (drm_dev_enter(dev))'.

Andrey

On 2021-12-23 8:47 a.m., Lazar, Lijo wrote:
[AMD Official Use Only]

Actually, I was asking specifically about this -

gmc_v9_0_sw_init -> amdgpu_bo_init->    adev->gmc.vram_mtrr = arch_phys_wc_add(adev->gmc.aper_base,
                                 adev->gmc.aper_size);


As per this patch, if the driver load failed due to some error which happens during hw_init(), this action is not undone. I was thinking why it's not considered important to skip this on driver failure to load. For ex: when some MTRR register is reserved for this mapping.

Thanks,
Lijo

-----Original Message-----
From: Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx>
Sent: Friday, December 17, 2021 9:05 PM
To: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
Subject: Re: [PATCH v3] drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure

We do unmap on unload, this function is a collection of all MMIO unmapps we already do on unload, it's just does them early on in case of device hot removal. Before pci_driver.remove callback (amdgpu_pci_remove) finish execution.

Andrey

On 2021-12-17 10:23 a.m., Lazar, Lijo wrote:
[AMD Official Use Only]

As a side note,  even if it's a failed driver load, why it is not important to undo the mappings created during driver load? I'm wondering what is the impact on a system like MI200 A+A.

Thanks,
Lijo

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of
Andrey Grodzovsky
Sent: Friday, December 17, 2021 8:32 PM
To: Koenig, Christian <Christian.Koenig@xxxxxxx>; Shi, Leslie
<Yuliang.Shi@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>; Deucher,
Alexander <Alexander.Deucher@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Chen, Guchun <Guchun.Chen@xxxxxxx>
Subject: Re: [PATCH v3] drm/amdgpu: Call amdgpu_device_unmap_mmio() if
device is unplugged to prevent crash in GPU initialization failure

Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>

Andrey

On 2021-12-17 3:49 a.m., Christian König wrote:
Am 17.12.21 um 03:26 schrieb Leslie Shi:
[Why]
In amdgpu_driver_load_kms, when amdgpu_device_init returns error
during driver modprobe, it will start the error handle path
immediately and call into amdgpu_device_unmap_mmio as well to
release mapped VRAM. However, in the following release callback,
driver stills visits the unmapped memory like
vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a kernel crash occurs.

[How]
call amdgpu_device_unmap_mmio() if device is unplugged to prevent
invalid memory address in
vcn_v3_0_sw_fini() when GPU initialization failure.

Signed-off-by: Leslie Shi <Yuliang.Shi@xxxxxxx>
Looks sane to me, but Andrey should probably nood as well.

Acked-by: Christian König <christian.koenig@xxxxxxx>

---
    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
    1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f31caec669e7..f6a47b927cfd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3899,7 +3899,9 @@ void amdgpu_device_fini_hw(struct
amdgpu_device
*adev)
          amdgpu_gart_dummy_page_fini(adev);
    -    amdgpu_device_unmap_mmio(adev);
+    if (drm_dev_is_unplugged(adev_to_drm(adev)))
+        amdgpu_device_unmap_mmio(adev);
+
    }
      void amdgpu_device_fini_sw(struct amdgpu_device *adev)



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux