[AMD Official Use Only - AMD Internal Distribution Only] In nps mode is somehow confusing. We'd like to differentiate recovery (*after* reset) from regular initialization. Is it possible to replace in nps mode check with more general approach? In regular initialization, set ras interface available in ip late init, while in recovery, let the flag set when recovery is completed. Regards, Hawking -----Original Message----- From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Tao Zhou Sent: Friday, November 8, 2024 7:14 PM To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Zhou1, Tao <Tao.Zhou1@xxxxxxx> Subject: [PATCH 02/23] drm/amdgpu: do RAS init in NPS mode switch NPS mode switch will call gpu reset, but this is different from normal reset. Signed-off-by: Tao Zhou <tao.zhou1@xxxxxxx> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 11 +++++++---- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index d69fcbb28b0e..635f020f8d9c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3293,7 +3293,7 @@ static int amdgpu_device_ip_late_init(struct amdgpu_device *adev) return r; } - if (!amdgpu_in_reset(adev)) + if (!amdgpu_in_reset(adev) || amdgpu_in_nps_switch(adev)) amdgpu_ras_set_error_query_ready(adev, true); amdgpu_device_set_cg_state(adev, AMD_CG_STATE_GATE); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index de1a55ae1d78..cbecf2380b51 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -1253,7 +1253,8 @@ int amdgpu_ras_bind_aca(struct amdgpu_device *adev, enum amdgpu_ras_block blk, struct ras_manager *obj; /* in resume phase, no need to create aca fs node */ - if (adev->in_suspend || amdgpu_in_reset(adev)) + if (adev->in_suspend || + (amdgpu_in_reset(adev) && !amdgpu_in_nps_switch(adev))) return 0; obj = get_ras_manager(adev, blk); @@ -3780,7 +3781,8 @@ int amdgpu_ras_block_late_init(struct amdgpu_device *adev, r = amdgpu_ras_feature_enable_on_boot(adev, ras_block, 1); if (r) { - if (adev->in_suspend || amdgpu_in_reset(adev)) { + if (adev->in_suspend || + (amdgpu_in_reset(adev) && !amdgpu_in_nps_switch(adev))) { /* in resume phase, if fail to enable ras, * clean up all ras fs nodes, and disable ras */ goto cleanup; @@ -3792,7 +3794,8 @@ int amdgpu_ras_block_late_init(struct amdgpu_device *adev, amdgpu_persistent_edc_harvesting(adev, ras_block); /* in resume phase, no need to create ras fs node */ - if (adev->in_suspend || amdgpu_in_reset(adev)) + if (adev->in_suspend || + (amdgpu_in_reset(adev) && !amdgpu_in_nps_switch(adev))) return 0; ras_obj = container_of(ras_block, struct amdgpu_ras_block_object, ras_comm); @@ -3922,7 +3925,7 @@ int amdgpu_ras_late_init(struct amdgpu_device *adev) amdgpu_ras_event_mgr_init(adev); if (amdgpu_ras_aca_is_supported(adev)) { - if (amdgpu_in_reset(adev)) { + if (amdgpu_in_reset(adev) && !amdgpu_in_nps_switch(adev)) { if (amdgpu_aca_is_enabled(adev)) r = amdgpu_aca_reset(adev); else -- 2.34.1