[Public] Sorry for missing this. Busy with new asic bringup. Let's adopt this for now and we can revise it later if we do see some corner case. Reviewed-by: Evan Quan <evan.quan@xxxxxxx> Thanks, Evan > -----Original Message----- > From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> > Sent: Tuesday, March 8, 2022 1:58 PM > To: Quan, Evan <Evan.Quan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Deucher, Alexander > <Alexander.Deucher@xxxxxxx>; Wang, Yang(Kevin) > <KevinYang.Wang@xxxxxxx> > Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is 0xFC > > [Public] > > Hi Evan, > > Any comments on this? > > Thanks, > Lijo > > -----Original Message----- > From: Lazar, Lijo > Sent: Friday, February 25, 2022 7:30 PM > To: Quan, Evan <Evan.Quan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Deucher, Alexander > <Alexander.Deucher@xxxxxxx>; Wang, Yang(Kevin) > <KevinYang.Wang@xxxxxxx> > Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is 0xFC > > > > On 2/25/2022 6:33 PM, Quan, Evan wrote: > > [AMD Official Use Only] > > > > > > > >> -----Original Message----- > >> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> > >> Sent: Friday, February 25, 2022 3:43 PM > >> To: Quan, Evan <Evan.Quan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > >> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Deucher, Alexander > >> <Alexander.Deucher@xxxxxxx>; Wang, Yang(Kevin) > >> <KevinYang.Wang@xxxxxxx> > >> Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is > >> 0xFC > >> > >> > >> > >> On 2/25/2022 1:02 PM, Quan, Evan wrote: > >>> [AMD Official Use Only] > >>> > >>> > >>> > >>>> -----Original Message----- > >>>> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> > >>>> Sent: Friday, February 25, 2022 2:03 PM > >>>> To: Quan, Evan <Evan.Quan@xxxxxxx>; amd- > gfx@xxxxxxxxxxxxxxxxxxxxx > >>>> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Deucher, Alexander > >>>> <Alexander.Deucher@xxxxxxx>; Wang, Yang(Kevin) > >>>> <KevinYang.Wang@xxxxxxx> > >>>> Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is > >>>> 0xFC > >>>> > >>>> > >>>> > >>>> On 2/25/2022 11:25 AM, Quan, Evan wrote: > >>>>> [AMD Official Use Only] > >>>>> > >>>>> > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> > >>>>>> Sent: Friday, February 25, 2022 1:47 PM > >>>>>> To: Quan, Evan <Evan.Quan@xxxxxxx>; amd- > >> gfx@xxxxxxxxxxxxxxxxxxxxx > >>>>>> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Deucher, > Alexander > >>>>>> <Alexander.Deucher@xxxxxxx>; Wang, Yang(Kevin) > >>>>>> <KevinYang.Wang@xxxxxxx> > >>>>>> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status > is > >>>>>> 0xFC > >>>>>> > >>>>>> [AMD Official Use Only] > >>>>>> > >>>>>>> That is the caller can perform something like issuing the same > >>>>>>> message again without prerequisites check on PMFW busy > >>>>>> > >>>>>> This patch expects this method. Caller may try to resend message > >>>>>> again. As part of message sending, driver first checks response > >>>>>> register. Current logic blocks sending any message if it sees > >>>>>> 0xFC in response register, this patch is to address that. > >>>>> [Quan, Evan] Yes, I know. But the caller here could be another one. > >>>>> I mean > >>>> there may be another caller stepped in. > >>>>> > >>>> > >>>> That shouldn't cause an issue to the second caller if it got message > mutex. > >>>> The second caller also should be able to send message if PMFW got > >>>> free by that time. The first caller can retry when it gets back the > >>>> message mutex. FW doesn't maintain any state for 0xFC response. Any > >>>> other message may be sent after that. If driver keeps the state > >>>> based on two callers, that is a logic problem in driver. I don't > >>>> think we have any > >> flow like that. > >>> [Quan, Evan] Yeah, but there may be some case that messages issued > >>> by > >> the two callers have dependence. > >>> That means the message issued by the 2nd caller should be only after > >>> the > >> 1st one. > >>> The one i can think of is "EnableAllSmuFeatures" message should be > >>> after > >> "SetAllowedFeatures" message. > >>> Although that should not cause any problem, I'm not sure whether > >>> there is > >> other similar case. > >>> > >>> What I suggest is something like below. We just do it again in > >> smu_cmn_send_smc_msg_with_param() on PMFW busy. > >>> > >>> int smu_cmn_send_smc_msg_with_param(struct smu_context *smu, > >>> enum smu_message_type msg, > >>> uint32_t param, > >>> uint32_t *read_arg) > >>> { > >>> ... > >>> ... > >>> mutex_lock(&smu->message_lock); > >>> reg = __smu_cmn_poll_stat(smu); > >>> res = __smu_cmn_reg2errno(smu, reg); > >>> if (reg == SMU_RESP_NONE || > >>> reg == SMU_RESP_BUSY_OTHER || > >>> res == -EREMOTEIO) { > >>> __smu_cmn_reg_print_error(smu, reg, index, param, msg); > >>> goto Out; > >>> } > >>> +retry: > >>> __smu_cmn_send_msg(smu, (uint16_t) index, param); > >>> reg = __smu_cmn_poll_stat(smu); > >>> res = __smu_cmn_reg2errno(smu, reg); > >>> + if (reg == SMU_RESP_BUSY_OTHER) { > >>> + mdelay(1); > >>> + goto retry; > >>> + } > >> > >> I suppose the retry option should be left to caller. Regardless of > >> retry or not, the patch is still valid. > >> > >> Example situation - > >> > >> rocm-smi is trying to get metrics and another app is trying set > >> performance profile. If metrics fetch fail and even retry of metrics > >> fetch fail after some loops, rocm-smi is free to fetch the metrics > >> again after say 5s. That also shouldn't prevent the second app to > >> send performance profile message and that app also may retry the same > later. > > [Quan, Evan] I have no doubt the second app may still be able to send > performance profile message. > > However, the metrics data retrieved by rocm-smi will be different > > under such scenario > That is after performance profile setting the > > clock frequencies may > go up. > > If the first app is sensitive to that(suppose it wants to compare the > > clock frequencies before and after performance profile setting), that > > will be a problem > > Ah, my intention was to tell about a different use case, probably a bad > example. > > What I intended is - > > rocm-smi is running periodically collecting metrics data and another app > which doesn't bother about the background data collection trying to do > something else. Thus a metrics fetch fail shouldn't prevent the other app > from changing performance profile or whatever it intended to do. > Also, rocm-smi may be able to fetch data during the next polling interval > when the PMFW gets relatively free. > > > > I reconsider this a bit. Can we add one more parameter for > smu_cmn_send_smc_msg_with_param()? > > That parameter can tell what the caller wants(retry or abort) under PMFW > busy scenario. > > There is one complication to this. The same message could return 0xFC under > different conditions, and it may be fine for situation A but not for situation B. > For ex: during driver load or init after a reset, we don't expect PMFW to be > busy.Driver may send message to get metrics data to check say 'smart shift is > enabled or not'. A failure with 0xFC is not a good sign that time and we > should abort there. > > The same message sent during runtime (like rocm-smi data collection) may > fail with 0xFC due to workload conditions. > > Adding another parameter will complicate things as we need to check every > condition of message usage. In the present case, a message failure with 0xFC > is treated as failure during init and we don't proceed. For a runtime use, apps > always have a choice whether to retry the same API or not. > > Thanks, > Lijo > > > > > BR > > Evan > >> > >> Thanks, > >> Lijo > >> > >>> ... > >>> ... > >>> } > >>> > >>> BR > >>> Evan > >>>> > >>>> Basically, 0xFC is not valid pre-condition check for sending any > >>>> message. As per PMFW team - it only means that PMFW was busy > when a > >>>> previous message was sent and PMFW won't change the response > status > >>>> when it becomes free. > >>>> > >>>> Thanks, > >>>> Lijo > >>>> > >>>>> BR > >>>>> Evan > >>>>>> > >>>>>> Thanks, > >>>>>> Lijo > >>>>>> > >>>>>> -----Original Message----- > >>>>>> From: Quan, Evan <Evan.Quan@xxxxxxx> > >>>>>> Sent: Friday, February 25, 2022 11:07 AM > >>>>>> To: Lazar, Lijo <Lijo.Lazar@xxxxxxx>; > >>>>>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx > >>>>>> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Deucher, > Alexander > >>>>>> <Alexander.Deucher@xxxxxxx>; Wang, Yang(Kevin) > >>>>>> <KevinYang.Wang@xxxxxxx> > >>>>>> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status > is > >>>>>> 0xFC > >>>>>> > >>>>>> [AMD Official Use Only] > >>>>>> > >>>>>> This may introduce some problems for two callers scenarios. That > >>>>>> is the 2nd one will still proceed even if the 1st one was already > blocked. > >>>>>> Maybe the logics here should be performed by the caller. That is > >>>>>> the caller can perform something like issuing the same message > >>>>>> again without prerequisites check on PMFW busy. > >>>>>> Or we can just update the smu_cmn_send_smc_msg APIs to give it > >>>>>> another try on PMFW busy. > >>>>>> > >>>>>> BR > >>>>>> Evan > >>>>>>> -----Original Message----- > >>>>>>> From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> > >>>>>>> Sent: Friday, February 25, 2022 12:22 PM > >>>>>>> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx > >>>>>>> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Deucher, > >> Alexander > >>>>>>> <Alexander.Deucher@xxxxxxx>; Wang, Yang(Kevin) > >>>>>>> <KevinYang.Wang@xxxxxxx>; Quan, Evan <Evan.Quan@xxxxxxx> > >>>>>>> Subject: [PATCH] drm/amd/pm: Send message when resp status is > >> 0xFC > >>>>>>> > >>>>>>> When PMFW is really busy, it will respond with 0xFC. However, it > >>>>>>> doesn't change the response register state when it becomes free. > >>>>>>> Driver should retry and proceed to send message if the response > >>>>>>> status is > >>>>>> 0xFC. > >>>>>>> > >>>>>>> Signed-off-by: Lijo Lazar <lijo.lazar@xxxxxxx> > >>>>>>> --- > >>>>>>> drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 -- > >>>>>>> 1 file changed, 2 deletions(-) > >>>>>>> > >>>>>>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > >>>>>>> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > >>>>>>> index 590a6ed12d54..92161b9d8c1a 100644 > >>>>>>> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > >>>>>>> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > >>>>>>> @@ -297,7 +297,6 @@ int > >> smu_cmn_send_msg_without_waiting(struct > >>>>>>> smu_context *smu, > >>>>>>> reg = __smu_cmn_poll_stat(smu); > >>>>>>> res = __smu_cmn_reg2errno(smu, reg); > >>>>>>> if (reg == SMU_RESP_NONE || > >>>>>>> - reg == SMU_RESP_BUSY_OTHER || > >>>>>>> res == -EREMOTEIO) > >>>>>>> goto Out; > >>>>>>> __smu_cmn_send_msg(smu, msg_index, param); @@ - > >> 391,7 +390,6 > >>>>>> @@ int > >>>>>>> smu_cmn_send_smc_msg_with_param(struct > >>>>>>> smu_context *smu, > >>>>>>> reg = __smu_cmn_poll_stat(smu); > >>>>>>> res = __smu_cmn_reg2errno(smu, reg); > >>>>>>> if (reg == SMU_RESP_NONE || > >>>>>>> - reg == SMU_RESP_BUSY_OTHER || > >>>>>>> res == -EREMOTEIO) { > >>>>>>> __smu_cmn_reg_print_error(smu, reg, index, param, > >> msg); > >>>>>>> goto Out; > >>>>>>> -- > >>>>>>> 2.25.1