On Tue, 28 Nov 2023 at 13:42, Stephan Gerhold <stephan@xxxxxxxxxxx> wrote: > > Hi Uffe, > > On Mon, Oct 16, 2023 at 04:47:52PM +0200, Ulf Hansson wrote: > > [...] > > > > > - MSM8916 (CPR+RPMPD): > > > > > https://github.com/msm8916-mainline/linux/commit/8880f39108206d7a60a0a8351c0373bddf58657c > > > > > > > > This looks a bit odd to me. Does a CPU really have four different > > > > power-domains, where three of them are performance-domains? > > > > > > > > > > Good question. I think we're largely entering "uncharted territory" with > > > these questions, I can just try to answer it the best I can from the > > > limited documentation and knowledge I have. :) > > > > > > The CPU does indeed use four different power domains. There also seem to > > > be additional power switches that gate power for some components without > > > having to turn off the entire supply. > > > > > > I'll list them twice from two points of view: Once mapping component -> > > > power domain, then again showing each power domain separately to make it > > > more clear. At the end I also want to make clear that MSM8909 (with the > > > "single" power domain) is actually exactly the same SoC design, just > > > with different regulators supplying the power domains. > > > > > > It's totally fine if you just skim over it. I'm listing it in detail > > > also as reference for myself. :D > > > > > > # Components > > > - SoC > > > - CPU subsystem ("APPS") > > > - CPU cluster > > > - 4x CPU core (logic and L1 cache) -> VDD_APC > > > - Shared L2 cache > > > - Logic -> VDD_APC > > > - Memory -> VDD_MX > > > - CPU clock controller (logic) -> VDD_CX > > > - Provides CPU frequency from different clock sources > > > - L2 cache runs at 1/2 of CPU frequency > > > => Both VDD_APC and VDD_MX must be scaled based on frequency > > > - CPU PLL clock source > > > - Generates the higher (GHz) CPU frequencies > > > - Logic (?, unsure) -> VDD_CX > > > - ??? -> VDD_SR2_APPS_PLL > > > => VDD_CX must be scaled based on PLL frequency > > > > > > # Power Domains > > > ## VDD_APC > > > - dedicated for CPU > > > - powered off completely in deepest cluster cpuidle state > > > > > > - per-core power switch (per-core cpuidle) > > > - CPU logic > > > - L1 cache controller/logic and maybe memory(?, unsure) > > > - shared L2 cache controller/logic > > > > > > => must be scaled based on CPU frequency > > > > > > ## VDD_MX > > > - global SoC power domain for "on-chip memories" > > > - always on, reduced to minimal voltage when entire SoC is idle > > > > > > - power switch (controlled by deepest cluster cpuidle state?, unsure) > > > - L2 cache memory > > > > > > => must be scaled based on L2 frequency (=> 1/2 CPU frequency) > > > > > > ## VDD_CX > > > - global SoC power domain for "digital logic" > > > - always on, reduced to minimal voltage when entire SoC is idle > > > - voting for VDD_CX in the RPM firmware also affects VDD_MX performance > > > state (firmware implicitly sets VDD_MX >= VDD_CX) > > > > > > - CPU clock controller logic, CPU PLL logic(?, unsure) > > > > > > => must be scaled based on CPU PLL frequency > > > > > > ## VDD_SR2_APPS_PLL > > > - global SoC power domain for CPU clock PLLs > > > - on MSM8916: always on with constant voltage > > > > > > => ignored in Linux at the moment > > > > > > # Power Domain Regulators > > > These power domains are literally input pins on the SoC chip. In theory > > > one could connect any suitable regulator to each of those. In practice > > > there are just a couple of standard reference designs that everyone > > > uses: > > > > > > ## MSM8916 (SoC) + PM8916 (PMIC) > > > We need to scale 3 power domains together with cpufreq: > > > > > > - VDD_APC (CPU logic) = &pm8916_spmi_s2 (via CPR) > > > - VDD_MX (L2 memory) = &pm8916_l3 (via RPMPD: MSM8916_VDDMX) > > > - VDD_CX (CPU PLL) = &pm8916_s1 (via RPMPD: MSM8916_VDDCX) > > > > > > ## MSM8909 (SoC) + PM8909 (PMIC) > > > We need to scale 1 power domain together with cpufreq: > > > > > > - VDD_APC = VDD_CX = &pm8909_s1 (via RPMPD: MSM8909_VDDCX) > > > (CPU logic, L2 logic and CPU PLL) > > > (- VDD_MX (L2 memory) = &pm8909_l3 (RPM firmware enforces VDD_MX >= VDD_CX)) > > > > > > There is implicit magic in the RPM firmware here that saves us from > > > scaling VDD_MX. VDD_CX/APC are the same power rail. > > > > > > ## MSM8909 (SoC) + PM8916 (PMIC) > > > When MSM8909 is paired with PM8916 instead of PM8909, the setup is > > > identical to MSM8916+PM8916. We need to scale 3 power domains. > > > > > > > In a way it sounds like an option could be to hook up the cpr to the > > > > rpmpd:s instead (possibly even set it as a child-domains to the > > > > rpmpd:s), assuming that is a better description of the HW, which it > > > > may not be, of course. > > > > > > Hm. It's definitely an option. I must admit I haven't really looked > > > much at child-domains so far, so spontaneously I'm not sure about > > > the implications, for both the abstract hardware description and > > > the implementation. > > > > > > There seems to be indeed some kind of relation between MX <=> CX/APC: > > > > > > - When voting for CX in the RPM firmware, it will always implicitly > > > adjust the MX performance state to be MX >= CX. > > > > > > - When scaling APC up, we must increase MX before APC. > > > - When scaling APC down, we must decrease MX after APC. > > > => Clearly MX >= APC. Not in terms of raw voltage, but at least for the > > > abstract performance state. > > > > > > Is this some kind of parent-child relationship between MX <=> CX and > > > MX <=> APC? > > > > Thanks for sharing the above. Yes, to me, it looks like there is a > > parent/child-domain relationship that could be worth describing/using. > > > > > > > > If yes, maybe we could indeed bind MX to the CPR genpd somehow. They use > > > different performance state numbering, so we need some kind of > > > translation. I'm not entirely sure how that would be described. > > > > Both the power-domain and the required-opps DT bindings > > (Documentation/devicetree/bindings/opp/opp-v2-base.yaml) are already > > allowing us to describe these kinds of hierarchical > > dependencies/layouts. > > > > In other words, to scale performance for a child domain, the child may > > rely on that we scale performance for the parent domain too. This is > > already supported by genpd and through the opp library - so it should > > just work. :-) > > > > I'm getting back to the "multiple power domains" case of MSM8916 now, as > discussed above. I've tried modelling MX as parent genpd of CPR, to > avoid having to scale multiple power domains as part of cpufreq. > > Basically, it looks like the following: > > cpr: power-controller@b018000 { > compatible = "qcom,msm8916-cpr", "qcom,cpr"; > reg = <0x0b018000 0x1000>; > /* ... */ > #power-domain-cells = <0>; > operating-points-v2 = <&cpr_opp_table>; > /* Supposed to be parent domain, not consumer */ > power-domains = <&rpmpd MSM8916_VDDMX_AO>; > > cpr_opp_table: opp-table { > compatible = "operating-points-v2-qcom-level"; > > cpr_opp1: opp1 { > opp-level = <1>; > qcom,opp-fuse-level = <1>; > required-opps = <&rpmpd_opp_svs_soc>; > }; > cpr_opp2: opp2 { > opp-level = <2>; > qcom,opp-fuse-level = <2>; > required-opps = <&rpmpd_opp_nom>; > }; > cpr_opp3: opp3 { > opp-level = <3>; > qcom,opp-fuse-level = <3>; > required-opps = <&rpmpd_opp_super_turbo>; > }; > }; > }; > > As already discussed [1] it's a bit annoying that the genpd core > attaches the power domain as consumer by default, but I work around this > by calling of_genpd_add_subdomain() followed by dev_pm_domain_detach() > in the CPR driver. Yep, that seems reasonable to me. > > The actual scaling works fine, performance states of the MX power domain > are updated when CPR performance state. I added some debug prints and it > looks e.g. as follows (CPR is the power-controller@): > > [ 24.498218] PM: mx_ao set performance state 6 > [ 24.498788] PM: power-controller@b018000 set performance state 3 > [ 24.511025] PM: mx_ao set performance state 3 > [ 24.511526] PM: power-controller@b018000 set performance state 1 > [ 24.521189] PM: mx_ao set performance state 4 > [ 24.521660] PM: power-controller@b018000 set performance state 2 > [ 24.533183] PM: mx_ao set performance state 6 > [ 24.533535] PM: power-controller@b018000 set performance state 3 > > There is one remaining problem here: Consider e.g. the switch from CPR > performance state 3 -> 1. In both cases the parent genpd state is set > *before* the child genpd. When scaling down, the parent genpd state must > be reduced *after* the child genpd. Otherwise, we can't guarantee that > the parent genpd state is always >= of the child state. Good point! > > In the OPP core, the order of such operations is always chosen based on > whether we are scaling up or down. When scaling up, power domain states > are set before the frequency is changed, and the other way around for > scaling down. > > Is this something you could imagine changing in the GENPD core, either > unconditionally for everyone, or as an option? This sounds like a generic problem that we need to fix for genpd. So for everyone. > > I tried to hack this in for a quick test and came up with the following > (the diff is unreadable so I'll just post the entire changed > (_genpd_set_performance_state() function). Admittedly it's a bit ugly. > > With these changes the sequence from above looks more like: > > [ 22.374555] PM: mx_ao set performance state 6 > [ 22.375175] PM: power-controller@b018000 set performance state 3 > [ 22.424661] PM: power-controller@b018000 set performance state 1 > [ 22.425169] PM: mx_ao set performance state 3 > [ 22.434932] PM: mx_ao set performance state 4 > [ 22.435331] PM: power-controller@b018000 set performance state 2 > [ 22.461197] PM: mx_ao set performance state 6 > [ 22.461968] PM: power-controller@b018000 set performance state 3 > > Which is correct now. > > Let me know if you have any thoughts about this. :-) Makes sense! Please post the below as a formal patch so I can review and test it! Kind regards Uffe > > Thanks for taking the time to discuss this! > Stephan > > [1]: https://lore.kernel.org/linux-pm/CAPDyKFq+zsoeF-4h5TfT4Z+S46a501_pUq8y2c1x==Tt6EKBGA@xxxxxxxxxxxxxx/ > > static int _genpd_set_performance_state(struct generic_pm_domain *genpd, > unsigned int state, int depth); > > static void _genpd_rollback_parent_state(struct gpd_link *link, int depth) > { > struct generic_pm_domain *parent = link->parent; > int parent_state; > > genpd_lock_nested(parent, depth + 1); > > parent_state = link->prev_performance_state; > link->performance_state = parent_state; > > parent_state = _genpd_reeval_performance_state(parent, parent_state); > if (_genpd_set_performance_state(parent, parent_state, depth + 1)) { > pr_err("%s: Failed to roll back to %d performance state\n", > parent->name, parent_state); > } > > genpd_unlock(parent); > } > > static int _genpd_set_parent_state(struct generic_pm_domain *genpd, > struct gpd_link *link, > unsigned int state, int depth) > { > struct generic_pm_domain *parent = link->parent; > int parent_state, ret; > > /* Find parent's performance state */ > ret = genpd_xlate_performance_state(genpd, parent, state); > if (unlikely(ret < 0)) > return ret; > > parent_state = ret; > > genpd_lock_nested(parent, depth + 1); > > link->prev_performance_state = link->performance_state; > link->performance_state = parent_state; > parent_state = _genpd_reeval_performance_state(parent, > parent_state); > ret = _genpd_set_performance_state(parent, parent_state, depth + 1); > if (ret) > link->performance_state = link->prev_performance_state; > > genpd_unlock(parent); > > return ret; > } > > static int _genpd_set_performance_state(struct generic_pm_domain *genpd, > unsigned int state, int depth) > { > struct gpd_link *link = NULL; > int ret; > > if (state == genpd->performance_state) > return 0; > > /* When scaling up, propagate to parents first in normal order */ > if (state > genpd->performance_state) { > list_for_each_entry(link, &genpd->child_links, child_node) { > ret = _genpd_set_parent_state(genpd, link, state, depth); > if (ret) > goto rollback_parents_up; > } > } > > if (genpd->set_performance_state) { > pr_err("%s set performance state %d\n", genpd->name, state); > ret = genpd->set_performance_state(genpd, state); > if (ret) { > if (link) > goto rollback_parents_up; > return ret; > } > } > > /* When scaling down, propagate to parents after in reverse order */ > if (state < genpd->performance_state) { > list_for_each_entry_reverse(link, &genpd->child_links, child_node) { > ret = _genpd_set_parent_state(genpd, link, state, depth); > if (ret) > goto rollback_parents_down; > } > } > > genpd->performance_state = state; > return 0; > > rollback_parents_up: > list_for_each_entry_continue_reverse(link, &genpd->child_links, child_node) > _genpd_rollback_parent_state(link, depth); > return ret; > rollback_parents_down: > list_for_each_entry_continue(link, &genpd->child_links, child_node) > _genpd_rollback_parent_state(link, depth); > return ret; > } >