I'd prefer a separate patch and code review for the fini-case, because
that addresses a different (potential) problem.
Thanks,
Felix
On 2022-10-04 15:43, Philip Yang wrote:
On 2022-10-04 15:16, Felix Kuehling wrote:
On 2022-10-04 12:41, Philip Yang wrote:
amdkfd_total_mem_size is the size of total GPUs vram plus system memory
to estimate page tables memory usage and leave enough VRAM room for
page
tables allocation.
Calculate amdkfd_total_mem_size in amdgpu_amdkfd_device_probe is
incorrect because adev->gmc.real_vram_size is still 0 called from
amdgpu_device_ip_early_init. Move the calculation
to amdgpu_amdkfd_device_init to get the correct VRAM size.
Signed-off-by: Philip Yang <Philip.Yang@xxxxxxx>
Reviewed-by: Felix Kuehling <Felix.Kuehling@xxxxxxx>
Semi-related to this, there should probably be a reverse calculation
in amdgpu_amdkfd_device_fini_sw to support hot-unplugging GPUs.
I will add the reverse calculation then submit.
Regards,
Philip
Regards,
Felix
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 9e98f3866edc..049d192c7cdf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -75,9 +75,6 @@ void amdgpu_amdkfd_device_probe(struct
amdgpu_device *adev)
return;
adev->kfd.dev = kgd2kfd_probe(adev, vf);
-
- if (adev->kfd.dev)
- amdgpu_amdkfd_total_mem_size += adev->gmc.real_vram_size;
}
/**
@@ -201,6 +198,8 @@ void amdgpu_amdkfd_device_init(struct
amdgpu_device *adev)
adev->kfd.init_complete = kgd2kfd_device_init(adev->kfd.dev,
adev_to_drm(adev), &gpu_resources);
+ amdgpu_amdkfd_total_mem_size += adev->gmc.real_vram_size;
+
INIT_WORK(&adev->kfd.reset_work, amdgpu_amdkfd_reset_work);
}
}