[Public] >-----Original Message----- >From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> >Sent: Monday, July 8, 2024 12:13 PM >To: Slivka, Danijel <Danijel.Slivka@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >Cc: Slivka, Danijel <Danijel.Slivka@xxxxxxx> >Subject: RE: [PATCH] drm/amd/pm: Ignore initial value in smu response register > >[Public] > >One problem is it's also bypassing a valid 0 response which usually means FW >may not have completed processing the previous message. > Bypassing zero value is added as 0 could represent garbage value. But adding HW_INIT state as initial state and ignoring any value when HW_INIT state is set. >What I thought was is it shouldn't even attempt sending a message if it >identified a FW hang. > >Is there a possibility to have the same problem whenever there is SRIOV full >access - as in before/after reset etc.? Yes, this could occur every time when VF is in full access. > >If state == FW_INIT, ignore response state before sending the message. >If there is no expected response to a message, make the state to FW_HANG. >This part is tricky as what qualifies as a FW hang could change based on the >specific SOC's message. Avoiding bool for this reason; to keep it open for having >other FW states. >If state == FW_HANG don't even attempt to send the message. > >Move FW state to FW_INIT whenever there is init/resume sequence - >hw_init/hw_resume? > Applied the suggestion and sent out new patch v2 [PATCH v2] drm/amd/pm: Ignore initial value in smu response register Thanks, BR, Danijel Slivka >Thanks, >Lijo >-----Original Message----- >From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Danijel >Slivka >Sent: Monday, July 8, 2024 1:37 PM >To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx >Cc: Slivka, Danijel <Danijel.Slivka@xxxxxxx> >Subject: [PATCH] drm/amd/pm: Ignore initial value in smu response register > >Why: >If the reg mmMP1_SMN_C2PMSG_90 is being written to during amdgpu driver >load or driver unload, subsequent amdgpu driver load will fail at smu_hw_init. >The default of mmMP1_SMN_C2PMSG_90 register at a clean environment is >0x1 and if value differs from expected, amdgpu driver load will fail. > >How to fix: >Ignore the initial value in smu response register before the first smu message is >sent, proceed further to send the message. If register holds >0x0 or an unexpected value after smu message was sent set fw_state_hang flag >and no further smu messages will be sent. > >Signed-off-by: Danijel Slivka <danijel.slivka@xxxxxxx> >--- > drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 1 + > drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 7 +++++-- > 2 files changed, 6 insertions(+), 2 deletions(-) > >diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h >b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h >index a34c802f52be..bfe08fa0db6d 100644 >--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h >+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h >@@ -562,6 +562,7 @@ struct smu_context { > uint32_t smc_fw_if_version; > uint32_t smc_fw_version; > uint32_t smc_fw_caps; >+ bool smc_fw_state_hang; > > bool uploading_custom_pp_table; > bool dc_controlled_by_gpio; >diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c >b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c >index 5592fd825aa3..9e4e62dcbee7 100644 >--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c >+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c >@@ -421,7 +421,7 @@ int smu_cmn_send_smc_msg_with_param(struct >smu_context *smu, > if (poll) { > reg = __smu_cmn_poll_stat(smu); > res = __smu_cmn_reg2errno(smu, reg); >- if (reg == SMU_RESP_NONE || res == -EREMOTEIO) { >+ if ((reg == SMU_RESP_NONE || res == -EREMOTEIO) && >+smu->smc_fw_state_hang) { > __smu_cmn_reg_print_error(smu, reg, index, param, msg); > goto Out; > } >@@ -429,8 +429,11 @@ int smu_cmn_send_smc_msg_with_param(struct >smu_context *smu, > __smu_cmn_send_msg(smu, (uint16_t) index, param); > reg = __smu_cmn_poll_stat(smu); > res = __smu_cmn_reg2errno(smu, reg); >- if (res != 0) >+ if (res != 0) { >+ if (reg == SMU_RESP_NONE || res == -EREMOTEIO) >+ smu->smc_fw_state_hang = true; > __smu_cmn_reg_print_error(smu, reg, index, param, msg); >+ } > if (read_arg) { > smu_cmn_read_arg(smu, read_arg); > dev_dbg(adev->dev, "smu send message: %s(%d) param: 0x%08x, >resp: 0x%08x,\ >-- >2.34.1 >