On 2024-07-31 04:10, Shikang Fan wrote:
Move kgd2kfd_init _zone_device() after release_full_gpu() as it takes long time for asics with huge bar size and it could potentially hit full access timeout for SRIOV during init. Signed-off-by: Shikang Fan <shikang.fan@xxxxxxx> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 3a43754e7f10..4494fa7ae70f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2930,10 +2930,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) amdgpu_ttm_set_buffer_funcs_status(adev, true); /* Don't init kfd if whole hive need to be reset during init */ - if (!adev->gmc.xgmi.pending_reset) { - kgd2kfd_init_zone_device(adev); + if (!adev->gmc.xgmi.pending_reset) amdgpu_amdkfd_device_init(adev); - } amdgpu_fru_get_product_info(adev); @@ -4362,6 +4360,13 @@ int amdgpu_device_init(struct amdgpu_device *adev, flush_delayed_work(&adev->delayed_init_work); } + /* On asics with huge bar size, memremap_pages can take long time + * and potentially leading to full access timeout for SRIOV. Move + * init_zone_device() after exit full gpu + */ + if (!adev->gmc.xgmi.pending_reset) + kgd2kfd_init_zone_device(adev); +
This change will not work because KFD amdgpu_amdkfd_device_init check KFD_IS_SVM_API_SUPPORTED, it always return false, as a result, SVM API is not enabled for user space.
Maybe you can move two function calls together here, if there is
no other init dependency issue.
/* Don't init kfd if whole hive need to be reset during init
*/
if (!adev->gmc.xgmi.pending_reset) {
kgd2kfd_init_zone_device(adev);
amdgpu_amdkfd_device_init(adev);
}
Regards,
Philip
/* * Place those sysfs registering after `late_init`. As some of those * operations performed in `late_init` might affect the sysfs