Re: [PATCH] arm64: dts: rockchip: Remove overdrive-mode OPPs from RK3588J SoC dtsi

Dragan Simic <dsimic@xxxxxxxxxxx> · Sun, 23 Mar 2025 11:19:56 +0100

Hello Quentin,

Thanks for your comments!  Please see some responses below.

On 2025-03-21 10:53, Quentin Schulz wrote:
On 3/21/25 4:28 AM, Dragan Simic wrote:
The differences in the vendor-approved CPU and GPU OPPs for the 
standard
Rockchip RK3588 variant [1] and the industrial Rockchip RK3588J 
variant [2]
come from the latter, presumably, supporting an extended temperature 
range
that's usually associated with industrial applications, despite the 
two SoC
variant datasheets specifying the same upper limit for the allowed 
ambient
temperature for both variants.  However, the lower temperature limit 
is

RK3588 is rated for 0-80°C, RK3588J for -40-85°C, c.f. Recommended
Operating Conditions, Table 3-2, Ambient Operating Temperature.

Indeed, which is why I specifically wrote "specifying the same upper
limit", because having a lower negative temperature limit could hardly
put the RK3588J in danger of overheating or running hotter. :)

specified much lower for the RK3588J variant. [1][2]

To be on the safe side and to ensure maximum longevity of the RK3588J 
SoCs,
only the CPU and GPU OPPs that are declared by the vendor to be always 
safe
for this SoC variant may be provided.  As explained by the vendor [3] 
and
according to its datasheet, [2] the RK3588J variant can actually run 
safely
at higher CPU and GPU OPPs as well, but only when not enjoying the 
assumed
extended temperature range that the RK3588J, as an SoC variant 
targeted

"only when not enjoying the assumed extended temperature range" is
extrapolated by me/us and not confirmed by Rockchip themselves. I've
asked for a statement on what "industrial environment" they specify in
the Normal Mode explanation means since it's the only time they use
the term. I've yet to receive an answer. The only thing Rockchip in
their datasheet is that the overdrive mode will shorten lifetime when
used for a long time, especially in high temperature conditions. It's
not clear whether we can use the overdrive mode even within the RK3588
typical range of operation.

True.  I'll see to rephrase the patch description a bit in the v2,
to avoid this kind of speculation.  I mean, perhaps the speculation
is right, but it hasn't been confirmed officially by Rockchip.

specifically at industrial applications, is made (or binned) for.

Thus, only the CPU and GPU OPPs that are specified by the vendor to be 
safe
throughout the entire RK3588J's extended temperature range may be 
provided,
while anyone who actually can ensure that their RK3588J-based board is
never going to run within the extended temperature range, may probably
safely apply a DT overlay that adds the higher CPU and GPU OPPs.  As 
we

Wouldn't say "safely" here, Rockchip still says that running overdrive
mode for a long time may shorten lifetime... that followed by
"especially in high temperature condition" doesn't mean that operating
the RK3588J within the RK3588 typical range is safe to do.

Also true, and I'll also try to address that in the v2.

obviously can't know what will be the runtime temperature conditions 
for
a particular board, we may provide only the always-safe OPPs.

With all this and the downstream RK3588(J) DT definitions [4][5] in 
mind,
let's delete the RK3588J CPU and GPU OPPs that are not considered 
belonging
to the normal operation mode for this SoC variant.  To quote the 
RK3588J
datasheet [2], "normal mode means the chipset works under safety 
voltage
and frequency;  for the industrial environment, highly recommend to 
keep in
normal mode, the lifetime is reasonably guaranteed", while "overdrive 
mode
brings higher frequency, and the voltage will increase accordingly;  
under
the overdrive mode for a long time, the chipset may shorten the 
lifetime,
especially in high temperature condition".

To sum up the RK3588J datasheet [2] and the vendor-provided DTs, 
[4][5]
the maximum allowed CPU core and GPU frequencies are as follows:

    IP core    | Normal mode | Overdrive mode
   ------------+-------------+----------------
    Cortex-A55 |   1,296 MHz |      1,704 MHz
    Cortex-A76 |   1,608 MHz |      2,016 MHz
    GPU        |     700 MHz |        850 MHz


The NPU too is impacted by this, so maybe list it anyway here? Even if
we don't support it right now and don't have OPPs for it.

Agreed, will add in the v2.  Having all that information in a single
place can only be helpful.

Unfortunately, when it comes to the actual voltages for the RK3588J 
CPU and
GPU OPPs, there's a discrepancy between the RK3588J datasheet [2] and 
the
downstream kernel code. [4][5]  The RK3588J datasheet states that "the 
max.
working voltage of CPU/GPU/NPU is 0.75 V under the normal mode", while 
the
downstream kernel code actually allows voltage ranges that go up to 
0.95 V,
which is still within the voltage range allowed by the datasheet.  
However,
the RK3588J datasheet also tells us to "strictly refer to the software
configuration of SDK and the hardware reference design", so let's 
embrace
the voltage ranges provided by the downstream kernel code, which also
prevents the undesirable theoretical outcome of ending up with no 
usable
OPPs on a particular board, as a result of the board's voltage 
regulator(s)
being unable to deliver the exact voltages, for whatever reason.

The above-described voltage ranges for the RK3588J CPU OPPs remain 
taken
from the downstream kernel code [4][5] by picking the highest, 
worst-bin
values, which ensure that all RK3588J bins will work reliably.  Yes, 
with
some power inevitably wasted as unnecessarily generated heat, but the
reliability is paramount, together with the longevity.  This 
deficiency
may be revisited separately at some point in the future.

The provided RK3588J CPU OPPs follow the slightly debatable "provide 
only
the highest-frequency OPP from the same-voltage group" approach that's 
been

Interesting that we went for a different strategy for the GPU OPPs :)

Good point, and I'm fully aware of that. :)  Actually, I'm rather
sure that omitting the additional CPU OPPs does no good to us, but
I didn't want to argue about that when they were dropped originally,
before I can have some hard numbers to prove it in a repeatable way.

established earlier, [6] as a result of the "same-voltage, 
lower-frequency"
OPPs being considered inefficient from the IPA governor's standpoint, 
which
may also be revisited separately at some point in the future.

[1] 
https://wiki.friendlyelec.com/wiki/images/e/ee/Rockchip_RK3588_Datasheet_V1.6-20231016.pdf
[2] 
https://wmsc.lcsc.com/wmsc/upload/file/pdf/v2/lcsc/2403201054_Rockchip-RK3588J_C22364189.pdf
[3] 
https://lore.kernel.org/linux-rockchip/e55125ed-64fb-455e-b1e4-cebe2cf006e4@xxxxxxxxx/T/#u
[4] 
https://raw.githubusercontent.com/rockchip-linux/kernel/604cec4004abe5a96c734f2fab7b74809d2d742f/arch/arm64/boot/dts/rockchip/rk3588s.dtsi
[5] 
https://raw.githubusercontent.com/rockchip-linux/kernel/604cec4004abe5a96c734f2fab7b74809d2d742f/arch/arm64/boot/dts/rockchip/rk3588j.dtsi
[6] 
https://lore.kernel.org/all/20240229-rk-dts-additions-v3-5-6afe8473a631@xxxxxxxxx/

Fixes: 667885a68658 ("arm64: dts: rockchip: Add OPP data for CPU cores 
on RK3588j")
Fixes: a7b2070505a2 ("arm64: dts: rockchip: Split GPU OPPs of RK3588 
and RK3588j")
Cc: stable@xxxxxxxxxxxxxxx
Cc: Heiko Stuebner <heiko@xxxxxxxxx>
Cc: Alexey Charkov <alchark@xxxxxxxxx>
Helped-by: Quentin Schulz <quentin.schulz@xxxxxxxxx>

Reported-by/Suggested-by?

I don't see Helped-by in
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes

I see 2496b2aaacf137250f4ca449f465e2cadaabb0e8 got the Helped-by
replaced by a Suggested-by for example, but I see other patches with
Helped-by... if that is a standard trailer for kernel patches, then
maybe we should add it to that doc?

Actually, I already tried to get the Helped-by tag added to the
kernel documentation, by submitting a small patch series. [*]
Unfortunately, it got rejected. :/

However, Heiko accepts Helped-by tags and nobody higher up the
tree seems to complain, so we should be fine. :)  It isn't the
case with all maintainers, though.

[*] 
https://lore.kernel.org/all/cover.1730874296.git.dsimic@xxxxxxxxxxx/T/#u

Signed-off-by: Dragan Simic <dsimic@xxxxxxxxxxx>
---
  arch/arm64/boot/dts/rockchip/rk3588j.dtsi | 53 
++++++++---------------
  1 file changed, 17 insertions(+), 36 deletions(-)

diff --git a/arch/arm64/boot/dts/rockchip/rk3588j.dtsi 
b/arch/arm64/boot/dts/rockchip/rk3588j.dtsi
index bce72bac4503..3045cb3bd68c 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588j.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3588j.dtsi
@@ -11,74 +11,59 @@ cluster0_opp_table: opp-table-cluster0 {
  		compatible = "operating-points-v2";
  		opp-shared;
  -		opp-1416000000 {
-			opp-hz = /bits/ 64 <1416000000>;
+		opp-1200000000 {
+			opp-hz = /bits/ 64 <1200000000>;
  			opp-microvolt = <750000 750000 950000>;
  			clock-latency-ns = <40000>;
  			opp-suspend;
  		};
-		opp-1608000000 {
-			opp-hz = /bits/ 64 <1608000000>;
-			opp-microvolt = <887500 887500 950000>;
-			clock-latency-ns = <40000>;
-		};
-		opp-1704000000 {
-			opp-hz = /bits/ 64 <1704000000>;
-			opp-microvolt = <937500 937500 950000>;
+		opp-1296000000 {
+			opp-hz = /bits/ 64 <1296000000>;
+			opp-microvolt = <775000 775000 950000>;

Got tricked by this one.

In the Rockchip vendor kernel, the opp-microvolt is 750000 750000
950000, so the same as CPU OPP 1.2GHz. However, the opp-microvolt-L1
and L0 are higher than that. Only a couple of the OPPs in vendor
kernel actually have opp-microvolt-L* higher than opp-microvolt, that
is a noteworthy oddity for anyone reviewing this patch :)

Anyway, that is correct, we take the highest voltage among all defined
opp-microvolt* properties.

Indeed, and I also almost got tricked by the same OPP. :)

I only have comments on the commit log, the diff is fine so:

Reviewed-by: Quentin Schulz <quentin.schulz@xxxxxxxxx>

Thanks once again for your detailed review! :)