This is a note to let you know that I've just added the patch titled drm/amd/display: Add smu write msg id fail retry process to the 6.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: drm-amd-display-add-smu-write-msg-id-fail-retry-process.patch and it can be found in the queue-6.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From 72105dcfa3d12b5af49311f857e3490baa225135 Mon Sep 17 00:00:00 2001 From: Fudong Wang <fudong.wang@xxxxxxx> Date: Fri, 11 Aug 2023 08:24:59 +0800 Subject: drm/amd/display: Add smu write msg id fail retry process From: Fudong Wang <fudong.wang@xxxxxxx> commit 72105dcfa3d12b5af49311f857e3490baa225135 upstream. A benchmark stress test (12-40 machines x 48hours) found that DCN315 has cases where DC writes to an indirect register to set the smu clock msg id, but when we go to read the same indirect register the returned msg id doesn't match with what we just set it to. So, to fix this retry the write until the register's value matches with the requested value. Cc: stable@xxxxxxxxxxxxxxx # 6.1+ Fixes: f94903996140 ("drm/amd/display: Add DCN315 CLK_MGR") Reviewed-by: Charlene Liu <charlene.liu@xxxxxxx> Acked-by: Hamza Mahfooz <hamza.mahfooz@xxxxxxx> Signed-off-by: Fudong Wang <fudong.wang@xxxxxxx> Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_smu.c | 20 ++++++++++--- 1 file changed, 16 insertions(+), 4 deletions(-) --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_smu.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_smu.c @@ -32,6 +32,7 @@ #define MAX_INSTANCE 6 #define MAX_SEGMENT 6 +#define SMU_REGISTER_WRITE_RETRY_COUNT 5 struct IP_BASE_INSTANCE { @@ -134,6 +135,8 @@ static int dcn315_smu_send_msg_with_para unsigned int msg_id, unsigned int param) { uint32_t result; + uint32_t i = 0; + uint32_t read_back_data; result = dcn315_smu_wait_for_response(clk_mgr, 10, 200000); @@ -150,10 +153,19 @@ static int dcn315_smu_send_msg_with_para /* Set the parameter register for the SMU message, unit is Mhz */ REG_WRITE(MP1_SMN_C2PMSG_37, param); - /* Trigger the message transaction by writing the message ID */ - generic_write_indirect_reg(CTX, - REG_NBIO(RSMU_INDEX), REG_NBIO(RSMU_DATA), - mmMP1_C2PMSG_3, msg_id); + for (i = 0; i < SMU_REGISTER_WRITE_RETRY_COUNT; i++) { + /* Trigger the message transaction by writing the message ID */ + generic_write_indirect_reg(CTX, + REG_NBIO(RSMU_INDEX), REG_NBIO(RSMU_DATA), + mmMP1_C2PMSG_3, msg_id); + read_back_data = generic_read_indirect_reg(CTX, + REG_NBIO(RSMU_INDEX), REG_NBIO(RSMU_DATA), + mmMP1_C2PMSG_3); + if (read_back_data == msg_id) + break; + udelay(2); + smu_print("SMU msg id write fail %x times. \n", i + 1); + } result = dcn315_smu_wait_for_response(clk_mgr, 10, 200000); Patches currently in stable-queue which might be from fudong.wang@xxxxxxx are queue-6.4/drm-amd-display-add-smu-write-msg-id-fail-retry-process.patch