Hi Sebastian! On Mon, Mar 4, 2024 at 9:51 PM Sebastian Reichel <sebastian.reichel@xxxxxxxxxxxxx> wrote: > > Hi, > > On Thu, Feb 29, 2024 at 11:26:31PM +0400, Alexey Charkov wrote: > > This enables thermal monitoring and CPU DVFS on RK3588(s), as well as > > active cooling on Radxa Rock 5B via the provided PWM fan. > > > > Some RK3588 boards use separate regulators to supply CPUs and their > > respective memory interfaces, so this is handled by coupling those > > regulators in affected boards' device trees to ensure that their > > voltage is adjusted in step. > > > > In this revision of the series I chose to enable TSADC for all boards > > at .dtsi level, because: > > - The defaults already in .dtsi should work for all users, given that > > the CRU based resets don't need any out-of-chip components, and > > the CRU vs. PMIC reset is pretty much the only thing a board might > > have to configure / override there > > - The boards that have TSADC_SHUT signal wired to the PMIC reset line > > can still choose to override the reset logic in their .dts. Or stay > > with CRU based resets, as downstream kernels do anyway > > - The on-by-default approach helps ensure thermal protections are in > > place (emergency reset and throttling) for any board even with a > > rudimentary .dts, and thus lets us introduce CPU DVFS with better > > peace of mind > > > > Fan control on Rock 5B has been split into two intervals: let it spin > > at the minimum cooling state between 55C and 65C, and then accelerate > > if the system crosses the 65C mark - thanks to Dragan for suggesting. > > This lets some cooling setups with beefier heatsinks and/or larger > > fan fins to stay in the quietest non-zero fan state while still > > gaining potential benefits from the airflow it generates, and > > possibly avoiding noisy speeds altogether for some workloads. > > > > OPPs help actually scale CPU frequencies up and down for both cooling > > and performance - tested on Rock 5B under varied loads. I've split > > the patch into two parts: the first containing those OPPs that seem > > to be no-regret with general consensus during v1 review [2], while > > the second contains OPPs that cause frequency reductions without > > accompanying decrease in CPU voltage. There seems to be a slight > > performance gain in some workload scenarios when using these, but > > previous discussion was inconclusive as to whether they should be > > included or not. Having them as separate patches enables easier > > comparison and partial reversion if people want to test it under > > their workloads, and also enables the first 'no-regret' part to be > > merged to -next while the jury is still out on the second one. > > > > [1] https://lore.kernel.org/linux-rockchip/1824717.EqSB1tO5pr@bagend/T/#ma2ab949da2235a8e759eab22155fb2bc397d8aea > > [2] https://lore.kernel.org/linux-rockchip/CABjd4YxqarUCbZ-a2XLe3TWJ-qjphGkyq=wDnctnEhdoSdPPpw@xxxxxxxxxxxxxx/T/#m49d2b94e773f5b532a0bb5d3d7664799ff28cc2c > > > > Signed-off-by: Alexey Charkov <alchark@xxxxxxxxx> > > --- > > Changes in v3: > > - Added regulator coupling for EVB1 and QuartzPro64 > > - Enabled the TSADC for all boards in .dtsi, not just Rock 5B (thanks ChenYu) > > - Added comments regarding two passive cooling trips in each zone (thanks Dragan) > > - Fixed active cooling map numbering for Radxa Rock 5B (thanks Dragan) > > - Dropped Daniel's Acked-by tag from the Rock 5B fan patch, as there's been quite some > > churn there since the version he acknowledged > > - Link to v2: https://lore.kernel.org/r/20240130-rk-dts-additions-v2-0-c6222c4c78df@xxxxxxxxx > > > > Changes in v2: > > - Dropped the rfkill patch which Heiko has already applied > > - Set higher 'polling-delay-passive' (100 instead of 20) > > - Name all cooling maps starting from map0 in each respective zone > > - Drop 'contribution' properties from passive cooling maps > > - Link to v1: https://lore.kernel.org/r/20240125-rk-dts-additions-v1-0-5879275db36f@xxxxxxxxx > > > > --- > > Alexey Charkov (5): > > arm64: dts: rockchip: enable built-in thermal monitoring on RK3588 > > arm64: dts: rockchip: enable automatic active cooling on Rock 5B > > arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588 > > arm64: dts: rockchip: Add OPP data for CPU cores on RK3588 > > arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs > > > > arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts | 12 + > > .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 12 + > > arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 30 +- > > arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 385 ++++++++++++++++++++- > > 4 files changed, 437 insertions(+), 2 deletions(-) > > I'm too busy to have a detailed review of this series right now, but > I pushed it to our CI and it results in a board reset at boot time: > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300950 > > I also pushed just the first three patches (i.e. without OPP / > cpufreq) and that boots fine: > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300953 Thank you for testing these! I've noticed in the boot log that the CI machine uses some u-boot 2023.07 - is that a downstream one? Any chance to compare it to 2023.11 or 2024.01 from your (Collabora) integration tree? I use 2023.11 from your integration tree, with a binary bl31, and I'm not getting those resets even under prolonged heavy load (I rebuild Chromium with 8 concurrent compilation jobs as the stress test - that's 14 hours of heavy CPU, memory and IO use). Would be interesting to understand if it's just a 'lucky' SoC specimen on my side, or if there is some dark magic happening differently on my machine vs. your CI machine. Thinking that maybe if your CI machine uses a downstream u-boot it might be leaving some extra hardware running (PVTM?) which might do weird stuff when TSADC/clocks/voltages get readjusted by the generic cpufreq driver?.. > Note, that OPP / cpufreq works on the same boards in the CI when > using the ugly-and-not-for-upstream cpufreq driver: > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/commit/9c90c5032743a0419bf3fd2f914a24fd53101acd > > My best guess right now is, that this is related to the generic > driver obviously not updating the GRF read margin registers. If it was about memory read margins I believe I would have been unlikely to get my machine to work reliably under heavy load with the default ones, but who knows... Best regards, Alexey