Re: [PATCH 20/20] interconnect: qcom: Divide clk rate by src node bus width

Konrad Dybcio <konrad.dybcio@xxxxxxxxxx> · Thu, 1 Jun 2023 14:43:50 +0200

On 30.05.2023 21:02, Stephan Gerhold wrote:
> On Tue, May 30, 2023 at 06:32:04PM +0200, Konrad Dybcio wrote:
>> On 30.05.2023 12:20, Konrad Dybcio wrote:
>>> Ever since the introduction of SMD RPM ICC, we've been dividing the
>>> clock rate by the wrong bus width. This has resulted in:
>>>
>>> - setting wrong (mostly too low) rates, affecting performance
>>>   - most often /2 or /4
>>>   - things like DDR never hit their full potential
>>>   - the rates were only correct if src bus width == dst bus width
>>>     for all src, dst pairs on a given bus
>>>
>>> - Qualcomm using the same wrong logic in their BSP driver in msm-5.x
>>>   that ships in production devices today
>>>
>>> - me losing my sanity trying to find this
>>>
>>> Resolve it by using dst_qn, if it exists.
>>>
>>> Fixes: 5e4e6c4d3ae0 ("interconnect: qcom: Add QCS404 interconnect provider driver")
>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@xxxxxxxxxx>
>>> ---
>> The problem is deeper.
>>
>> Chatting with Stephan (+CC), we tackled a few issues (that I will send
>> fixes for in v2):
>>
>> 1. qcom_icc_rpm_set() should take per-node (src_qn->sum_avg, dst_qn->sum_avg)
>>    and NOT aggregated bw (unless you want ALL of your nodes on a given provider
>>    to "go very fast")
>>
>> 2. the aggregate bw/clk rate calculation should use the node-specific bus widths
>>    and not only the bus width of the src/dst node, otherwise the average bw
>>    values will be utterly meaningless
>>
> 
> The peak bandwidth / clock rate is wrong as well if you have two paths
> with different buswidths on the same bus/NoC. (If someone is interested
> in details I can post my specific example I had in the chat, it shows
> this more clearly.)
agg_peak takes care of that, I believe..

> 
>> 3. thanks to (1) and (2) qcom_icc_bus_aggregate() can be remodeled to instead
>>    calculate the clock rates for the two rpm contexts, which we can then max()
>>    and pass on to the ratesetting call
>>
> 
> Sounds good.
> 
>>
>> ----8<---- Cutting off Stephan's seal of approval, this is my thinking ----
>>
>> 4. I *think* Qualcomm really made a mistake in their msm-5.4 driver where they
>>    took most of the logic from the current -next state and should have been
>>    setting the rate based on the *DST* provider, or at least that's my
>>    understanding trying to read the "known good" msm-4.19 driver
>>    (which remembers msm-3.0 lol).. Or maybe we should keep src but ensure there's
>>    also a final (dst, dst) vote cast:
>>
>> provider->inter_set = false // current state upstream
>>
>> setting apps_proc<->slv_bimc_snoc
>> setting mas_bimc_snoc<->slv_snoc_cnoc
>> setting mas_snoc_cnoc<->qhs_sdc2
>>
>>
>> provider->inter_set = true // I don't think there's effectively a difference?
>>
>> setting apps_proc<->slv_bimc_snoc
>> setting slv_bimc_snoc<->mas_bimc_snoc
>> setting mas_bimc_snoc<->slv_snoc_cnoc
>> setting slv_snoc_cnoc<->mas_snoc_cnoc
>> setting mas_snoc_cnoc<->qhs_sdc2
>>
> 
> I think with our proposed changes above it does no longer matter if a
> node is passed as "src" or "dst". This means in your example above you
> just waste additional time setting the bandwidth twice for
> slv_bimc_snoc, mas_bimc_snoc, slv_snoc_cnoc and mas_snoc_cnoc.
> The final outcome is the same with or without "inter_set".
Yeah I guess due to the fact that two "real" nodes are always
connected by a set of "gateway" nodes, the rate will be applied..

I am however not sure if we're supposed to set the bandwidth
(via qcom_icc_rpm_set()) on all of them..

Konrad
> 
> Thanks,
> Stephan