[AMD Official Use Only] Hi Shaoyun, Yes, From SMU FW point of view they do see a difference between Bare metal and passthrough case for SBR. For baremetal they get it as a PCI reset whereas passthrough case they get it as a BIF reset. Now within BIF reset they would need to differentiate between older asic( where we do BACO) and newer ones where we do mode 1 reset. Hence in-order for SMU to differentiate these scenarios we are adding a new message. I think I will rename the function to smu_handle_passthrough_sbr from the current smu_set_light_sbr function name. Regards Sashank -----Original Message----- From: Liu, Shaoyun <Shaoyun.Liu@xxxxxxx> Sent: Friday, December 17, 2021 11:45 AM To: Saye, Sashank <Sashank.Saye@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Saye, Sashank <Sashank.Saye@xxxxxxx> Subject: RE: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling [AMD Official Use Only] First , the name of heavy SBR is confusing when you need to go through light SBR code path. Secondary, originally we introduce the light SBR is because on older asic, FW can not synchronize the reset on the devices within the hive, so it depends on driver to sync the reset. From what I have heard , for chip aructus , the FW actually can sync the reset itself. I don't see a necessary to introduce the heavy SBR message, it seems SMU will do a full reset when it get SBR request. IS there a different code path for SMU to handle the reset for XGMI in passthrough mode ? Regards Shaoyun.liu -----Original Message----- From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of sashank saye Sent: Friday, December 17, 2021 10:33 AM To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Saye, Sashank <Sashank.Saye@xxxxxxx> Subject: [PATCH] drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling For Aldebaran chip passthrough case we need to intimate SMU about special handling for SBR.On older chips we send LightSBR to SMU, enabling the same for Aldebaran. Slight difference, compared to previous chips, is on Aldebaran, SMU would do a heavy reset on SBR. Hence, the word Heavy instead of Light SBR is used for SMU to differentiate. Signed-off-by: sashank saye <sashank.saye@xxxxxxx> Change-Id: I79420e7352bb670d6f9696df97d7546f131b18fc --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h | 4 +++- drivers/gpu/drm/amd/pm/inc/smu_types.h | 3 ++- drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 11 +++++++++++ 4 files changed, 18 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index f31caec669e7..06aee23505b2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2618,8 +2618,8 @@ static int amdgpu_device_ip_late_init(struct amdgpu_device *adev) if (r) DRM_ERROR("enable mgpu fan boost failed (%d).\n", r); - /* For XGMI + passthrough configuration on arcturus, enable light SBR */ - if (adev->asic_type == CHIP_ARCTURUS && + /* For XGMI + passthrough configuration on arcturus and aldebaran, enable light SBR */ + if ((adev->asic_type == CHIP_ARCTURUS || adev->asic_type == +CHIP_ALDEBARAN ) && amdgpu_passthrough(adev) && adev->gmc.xgmi.num_physical_nodes > 1) smu_set_light_sbr(&adev->smu, true); diff --git a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h index 35fa0d8e92dd..ab66a4b9e438 100644 --- a/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h +++ b/drivers/gpu/drm/amd/pm/inc/aldebaran_ppsmc.h @@ -102,7 +102,9 @@ #define PPSMC_MSG_GfxDriverResetRecovery 0x42 #define PPSMC_MSG_BoardPowerCalibration 0x43 -#define PPSMC_Message_Count 0x44 +#define PPSMC_MSG_HeavySBR 0x45 +#define PPSMC_Message_Count 0x46 + //PPSMC Reset Types #define PPSMC_RESET_TYPE_WARM_RESET 0x00 diff --git a/drivers/gpu/drm/amd/pm/inc/smu_types.h b/drivers/gpu/drm/amd/pm/inc/smu_types.h index 18b862a90fbe..ff8a0bcbd290 100644 --- a/drivers/gpu/drm/amd/pm/inc/smu_types.h +++ b/drivers/gpu/drm/amd/pm/inc/smu_types.h @@ -229,7 +229,8 @@ __SMU_DUMMY_MAP(BoardPowerCalibration), \ __SMU_DUMMY_MAP(RequestGfxclk), \ __SMU_DUMMY_MAP(ForceGfxVid), \ - __SMU_DUMMY_MAP(UnforceGfxVid), + __SMU_DUMMY_MAP(UnforceGfxVid), \ + __SMU_DUMMY_MAP(HeavySBR), #undef __SMU_DUMMY_MAP #define __SMU_DUMMY_MAP(type) SMU_MSG_##type diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c index 7433a051e795..f442950e9676 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c @@ -141,6 +141,7 @@ static const struct cmn2asic_msg_mapping aldebaran_message_map[SMU_MSG_MAX_COUNT MSG_MAP(SetUclkDpmMode, PPSMC_MSG_SetUclkDpmMode, 0), MSG_MAP(GfxDriverResetRecovery, PPSMC_MSG_GfxDriverResetRecovery, 0), MSG_MAP(BoardPowerCalibration, PPSMC_MSG_BoardPowerCalibration, 0), + MSG_MAP(HeavySBR, PPSMC_MSG_HeavySBR, 0), }; static const struct cmn2asic_mapping aldebaran_clk_map[SMU_CLK_COUNT] = { @@ -1912,6 +1913,15 @@ static int aldebaran_mode2_reset(struct smu_context *smu) return ret; } +static int aldebaran_set_light_sbr(struct smu_context *smu, bool +enable) { + int ret = 0; + //For alderbarn chip, SMU would do a mode 1 reset as part of SBR hence we call it HeavySBR instead of light + ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_HeavySBR, enable ? +1 : 0, NULL); + + return ret; +} + static bool aldebaran_is_mode1_reset_supported(struct smu_context *smu) { #if 0 @@ -2021,6 +2031,7 @@ static const struct pptable_funcs aldebaran_ppt_funcs = { .get_gpu_metrics = aldebaran_get_gpu_metrics, .mode1_reset_is_support = aldebaran_is_mode1_reset_supported, .mode2_reset_is_support = aldebaran_is_mode2_reset_supported, + .set_light_sbr = aldebaran_set_light_sbr, .mode1_reset = aldebaran_mode1_reset, .set_mp1_state = aldebaran_set_mp1_state, .mode2_reset = aldebaran_mode2_reset, -- 2.25.1