Adding back public list and Leslie specifically.
Lijo is right and it's not MTTR release only, also all the unmaps in
amdgpu_device_unmap_mmio
since this patch makes call to amdgpu_device_unmap_mmio conditioned on
device unplugged
we need then to still take care of unmapping even when device is NOT
unplugged - for this i suggest
to look at 'drm/amdgpu: Unmap all MMIO mappings' and just conditionally
call all the deleted unmaps
in the patch where the condition is 'if (drm_dev_enter(dev))'.
Andrey
On 2021-12-23 8:47 a.m., Lazar, Lijo wrote:
[AMD Official Use Only]
Actually, I was asking specifically about this -
gmc_v9_0_sw_init -> amdgpu_bo_init-> adev->gmc.vram_mtrr = arch_phys_wc_add(adev->gmc.aper_base,
adev->gmc.aper_size);
As per this patch, if the driver load failed due to some error which happens during hw_init(), this action is not undone. I was thinking why it's not considered important to skip this on driver failure to load. For ex: when some MTRR register is reserved for this mapping.
Thanks,
Lijo
-----Original Message-----
From: Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx>
Sent: Friday, December 17, 2021 9:05 PM
To: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
Subject: Re: [PATCH v3] drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to prevent crash in GPU initialization failure
We do unmap on unload, this function is a collection of all MMIO unmapps we already do on unload, it's just does them early on in case of device hot removal. Before pci_driver.remove callback (amdgpu_pci_remove) finish execution.
Andrey
On 2021-12-17 10:23 a.m., Lazar, Lijo wrote:
[AMD Official Use Only]
As a side note, even if it's a failed driver load, why it is not important to undo the mappings created during driver load? I'm wondering what is the impact on a system like MI200 A+A.
Thanks,
Lijo
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of
Andrey Grodzovsky
Sent: Friday, December 17, 2021 8:32 PM
To: Koenig, Christian <Christian.Koenig@xxxxxxx>; Shi, Leslie
<Yuliang.Shi@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>; Deucher,
Alexander <Alexander.Deucher@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Chen, Guchun <Guchun.Chen@xxxxxxx>
Subject: Re: [PATCH v3] drm/amdgpu: Call amdgpu_device_unmap_mmio() if
device is unplugged to prevent crash in GPU initialization failure
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>
Andrey
On 2021-12-17 3:49 a.m., Christian König wrote:
Am 17.12.21 um 03:26 schrieb Leslie Shi:
[Why]
In amdgpu_driver_load_kms, when amdgpu_device_init returns error
during driver modprobe, it will start the error handle path
immediately and call into amdgpu_device_unmap_mmio as well to
release mapped VRAM. However, in the following release callback,
driver stills visits the unmapped memory like
vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a kernel crash occurs.
[How]
call amdgpu_device_unmap_mmio() if device is unplugged to prevent
invalid memory address in
vcn_v3_0_sw_fini() when GPU initialization failure.
Signed-off-by: Leslie Shi <Yuliang.Shi@xxxxxxx>
Looks sane to me, but Andrey should probably nood as well.
Acked-by: Christian König <christian.koenig@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f31caec669e7..f6a47b927cfd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3899,7 +3899,9 @@ void amdgpu_device_fini_hw(struct
amdgpu_device
*adev)
amdgpu_gart_dummy_page_fini(adev);
- amdgpu_device_unmap_mmio(adev);
+ if (drm_dev_is_unplugged(adev_to_drm(adev)))
+ amdgpu_device_unmap_mmio(adev);
+
}
void amdgpu_device_fini_sw(struct amdgpu_device *adev)