Re: [PATCH 1/4] cpufreq: qcom-nvmem: Handling multiple power domains

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 28 Nov 2023 at 13:42, Stephan Gerhold <stephan@xxxxxxxxxxx> wrote:
>
> Hi Uffe,
>
> On Mon, Oct 16, 2023 at 04:47:52PM +0200, Ulf Hansson wrote:
> > [...]
> > > > >   - MSM8916 (CPR+RPMPD):
> > > > >     https://github.com/msm8916-mainline/linux/commit/8880f39108206d7a60a0a8351c0373bddf58657c
> > > >
> > > > This looks a bit odd to me. Does a CPU really have four different
> > > > power-domains, where three of them are performance-domains?
> > > >
> > >
> > > Good question. I think we're largely entering "uncharted territory" with
> > > these questions, I can just try to answer it the best I can from the
> > > limited documentation and knowledge I have. :)
> > >
> > > The CPU does indeed use four different power domains. There also seem to
> > > be additional power switches that gate power for some components without
> > > having to turn off the entire supply.
> > >
> > > I'll list them twice from two points of view: Once mapping component ->
> > > power domain, then again showing each power domain separately to make it
> > > more clear. At the end I also want to make clear that MSM8909 (with the
> > > "single" power domain) is actually exactly the same SoC design, just
> > > with different regulators supplying the power domains.
> > >
> > > It's totally fine if you just skim over it. I'm listing it in detail
> > > also as reference for myself. :D
> > >
> > > # Components
> > >  - SoC
> > >    - CPU subsystem ("APPS")
> > >      - CPU cluster
> > >        - 4x CPU core (logic and L1 cache) -> VDD_APC
> > >        - Shared L2 cache
> > >          - Logic -> VDD_APC
> > >          - Memory -> VDD_MX
> > >      - CPU clock controller (logic) -> VDD_CX
> > >        - Provides CPU frequency from different clock sources
> > >        - L2 cache runs at 1/2 of CPU frequency
> > >        => Both VDD_APC and VDD_MX must be scaled based on frequency
> > >      - CPU PLL clock source
> > >        - Generates the higher (GHz) CPU frequencies
> > >        - Logic (?, unsure) -> VDD_CX
> > >        - ??? -> VDD_SR2_APPS_PLL
> > >        => VDD_CX must be scaled based on PLL frequency
> > >
> > > # Power Domains
> > > ## VDD_APC
> > >  - dedicated for CPU
> > >  - powered off completely in deepest cluster cpuidle state
> > >
> > >  - per-core power switch (per-core cpuidle)
> > >    - CPU logic
> > >    - L1 cache controller/logic and maybe memory(?, unsure)
> > >  - shared L2 cache controller/logic
> > >
> > >  => must be scaled based on CPU frequency
> > >
> > > ## VDD_MX
> > >  - global SoC power domain for "on-chip memories"
> > >  - always on, reduced to minimal voltage when entire SoC is idle
> > >
> > >  - power switch (controlled by deepest cluster cpuidle state?, unsure)
> > >    - L2 cache memory
> > >
> > >  => must be scaled based on L2 frequency (=> 1/2 CPU frequency)
> > >
> > > ## VDD_CX
> > >  - global SoC power domain for "digital logic"
> > >  - always on, reduced to minimal voltage when entire SoC is idle
> > >  - voting for VDD_CX in the RPM firmware also affects VDD_MX performance
> > >    state (firmware implicitly sets VDD_MX >= VDD_CX)
> > >
> > >  - CPU clock controller logic, CPU PLL logic(?, unsure)
> > >
> > >  => must be scaled based on CPU PLL frequency
> > >
> > > ## VDD_SR2_APPS_PLL
> > >  - global SoC power domain for CPU clock PLLs
> > >  - on MSM8916: always on with constant voltage
> > >
> > >  => ignored in Linux at the moment
> > >
> > > # Power Domain Regulators
> > > These power domains are literally input pins on the SoC chip. In theory
> > > one could connect any suitable regulator to each of those. In practice
> > > there are just a couple of standard reference designs that everyone
> > > uses:
> > >
> > > ## MSM8916 (SoC) + PM8916 (PMIC)
> > > We need to scale 3 power domains together with cpufreq:
> > >
> > >  - VDD_APC (CPU logic) = &pm8916_spmi_s2 (via CPR)
> > >  - VDD_MX  (L2 memory) = &pm8916_l3 (via RPMPD: MSM8916_VDDMX)
> > >  - VDD_CX  (CPU PLL)   = &pm8916_s1 (via RPMPD: MSM8916_VDDCX)
> > >
> > > ## MSM8909 (SoC) + PM8909 (PMIC)
> > > We need to scale 1 power domain together with cpufreq:
> > >
> > >  - VDD_APC = VDD_CX    = &pm8909_s1 (via RPMPD: MSM8909_VDDCX)
> > >    (CPU logic, L2 logic and CPU PLL)
> > > (- VDD_MX  (L2 memory) = &pm8909_l3 (RPM firmware enforces VDD_MX >= VDD_CX))
> > >
> > > There is implicit magic in the RPM firmware here that saves us from
> > > scaling VDD_MX. VDD_CX/APC are the same power rail.
> > >
> > > ## MSM8909 (SoC) + PM8916 (PMIC)
> > > When MSM8909 is paired with PM8916 instead of PM8909, the setup is
> > > identical to MSM8916+PM8916. We need to scale 3 power domains.
> > >
> > > > In a way it sounds like an option could be to hook up the cpr to the
> > > > rpmpd:s instead (possibly even set it as a child-domains to the
> > > > rpmpd:s), assuming that is a better description of the HW, which it
> > > > may not be, of course.
> > >
> > > Hm. It's definitely an option. I must admit I haven't really looked
> > > much at child-domains so far, so spontaneously I'm not sure about
> > > the implications, for both the abstract hardware description and
> > > the implementation.
> > >
> > > There seems to be indeed some kind of relation between MX <=> CX/APC:
> > >
> > >  - When voting for CX in the RPM firmware, it will always implicitly
> > >    adjust the MX performance state to be MX >= CX.
> > >
> > >  - When scaling APC up, we must increase MX before APC.
> > >  - When scaling APC down, we must decrease MX after APC.
> > >  => Clearly MX >= APC. Not in terms of raw voltage, but at least for the
> > >     abstract performance state.
> > >
> > > Is this some kind of parent-child relationship between MX <=> CX and
> > > MX <=> APC?
> >
> > Thanks for sharing the above. Yes, to me, it looks like there is a
> > parent/child-domain relationship that could be worth describing/using.
> >
> > >
> > > If yes, maybe we could indeed bind MX to the CPR genpd somehow. They use
> > > different performance state numbering, so we need some kind of
> > > translation. I'm not entirely sure how that would be described.
> >
> > Both the power-domain and the required-opps DT bindings
> > (Documentation/devicetree/bindings/opp/opp-v2-base.yaml) are already
> > allowing us to describe these kinds of hierarchical
> > dependencies/layouts.
> >
> > In other words, to scale performance for a child domain, the child may
> > rely on that we scale performance for the parent domain too. This is
> > already supported by genpd and through the opp library - so it should
> > just work. :-)
> >
>
> I'm getting back to the "multiple power domains" case of MSM8916 now, as
> discussed above. I've tried modelling MX as parent genpd of CPR, to
> avoid having to scale multiple power domains as part of cpufreq.
>
> Basically, it looks like the following:
>
>         cpr: power-controller@b018000 {
>                 compatible = "qcom,msm8916-cpr", "qcom,cpr";
>                 reg = <0x0b018000 0x1000>;
>                 /* ... */
>                 #power-domain-cells = <0>;
>                 operating-points-v2 = <&cpr_opp_table>;
>                 /* Supposed to be parent domain, not consumer */
>                 power-domains = <&rpmpd MSM8916_VDDMX_AO>;
>
>                 cpr_opp_table: opp-table {
>                         compatible = "operating-points-v2-qcom-level";
>
>                         cpr_opp1: opp1 {
>                                 opp-level = <1>;
>                                 qcom,opp-fuse-level = <1>;
>                                 required-opps = <&rpmpd_opp_svs_soc>;
>                         };
>                         cpr_opp2: opp2 {
>                                 opp-level = <2>;
>                                 qcom,opp-fuse-level = <2>;
>                                 required-opps = <&rpmpd_opp_nom>;
>                         };
>                         cpr_opp3: opp3 {
>                                 opp-level = <3>;
>                                 qcom,opp-fuse-level = <3>;
>                                 required-opps = <&rpmpd_opp_super_turbo>;
>                         };
>                 };
>         };
>
> As already discussed [1] it's a bit annoying that the genpd core
> attaches the power domain as consumer by default, but I work around this
> by calling of_genpd_add_subdomain() followed by dev_pm_domain_detach()
> in the CPR driver.

Yep, that seems reasonable to me.

>
> The actual scaling works fine, performance states of the MX power domain
> are updated when CPR performance state. I added some debug prints and it
> looks e.g. as follows (CPR is the power-controller@):
>
>     [   24.498218] PM: mx_ao set performance state 6
>     [   24.498788] PM: power-controller@b018000 set performance state 3
>     [   24.511025] PM: mx_ao set performance state 3
>     [   24.511526] PM: power-controller@b018000 set performance state 1
>     [   24.521189] PM: mx_ao set performance state 4
>     [   24.521660] PM: power-controller@b018000 set performance state 2
>     [   24.533183] PM: mx_ao set performance state 6
>     [   24.533535] PM: power-controller@b018000 set performance state 3
>
> There is one remaining problem here: Consider e.g. the switch from CPR
> performance state 3 -> 1. In both cases the parent genpd state is set
> *before* the child genpd. When scaling down, the parent genpd state must
> be reduced *after* the child genpd. Otherwise, we can't guarantee that
> the parent genpd state is always >= of the child state.

Good point!

>
> In the OPP core, the order of such operations is always chosen based on
> whether we are scaling up or down. When scaling up, power domain states
> are set before the frequency is changed, and the other way around for
> scaling down.
>
> Is this something you could imagine changing in the GENPD core, either
> unconditionally for everyone, or as an option?

This sounds like a generic problem that we need to fix for genpd. So
for everyone.

>
> I tried to hack this in for a quick test and came up with the following
> (the diff is unreadable so I'll just post the entire changed
> (_genpd_set_performance_state() function). Admittedly it's a bit ugly.
>
> With these changes the sequence from above looks more like:
>
>     [   22.374555] PM: mx_ao set performance state 6
>     [   22.375175] PM: power-controller@b018000 set performance state 3
>     [   22.424661] PM: power-controller@b018000 set performance state 1
>     [   22.425169] PM: mx_ao set performance state 3
>     [   22.434932] PM: mx_ao set performance state 4
>     [   22.435331] PM: power-controller@b018000 set performance state 2
>     [   22.461197] PM: mx_ao set performance state 6
>     [   22.461968] PM: power-controller@b018000 set performance state 3
>
> Which is correct now.
>
> Let me know if you have any thoughts about this. :-)

Makes sense! Please post the below as a formal patch so I can review
and test it!

Kind regards
Uffe

>
> Thanks for taking the time to discuss this!
> Stephan
>
> [1]: https://lore.kernel.org/linux-pm/CAPDyKFq+zsoeF-4h5TfT4Z+S46a501_pUq8y2c1x==Tt6EKBGA@xxxxxxxxxxxxxx/
>
> static int _genpd_set_performance_state(struct generic_pm_domain *genpd,
>                                         unsigned int state, int depth);
>
> static void _genpd_rollback_parent_state(struct gpd_link *link, int depth)
> {
>         struct generic_pm_domain *parent = link->parent;
>         int parent_state;
>
>         genpd_lock_nested(parent, depth + 1);
>
>         parent_state = link->prev_performance_state;
>         link->performance_state = parent_state;
>
>         parent_state = _genpd_reeval_performance_state(parent, parent_state);
>         if (_genpd_set_performance_state(parent, parent_state, depth + 1)) {
>                 pr_err("%s: Failed to roll back to %d performance state\n",
>                        parent->name, parent_state);
>         }
>
>         genpd_unlock(parent);
> }
>
> static int _genpd_set_parent_state(struct generic_pm_domain *genpd,
>                                    struct gpd_link *link,
>                                    unsigned int state, int depth)
> {
>         struct generic_pm_domain *parent = link->parent;
>         int parent_state, ret;
>
>         /* Find parent's performance state */
>         ret = genpd_xlate_performance_state(genpd, parent, state);
>         if (unlikely(ret < 0))
>                 return ret;
>
>         parent_state = ret;
>
>         genpd_lock_nested(parent, depth + 1);
>
>         link->prev_performance_state = link->performance_state;
>         link->performance_state = parent_state;
>         parent_state = _genpd_reeval_performance_state(parent,
>                                                 parent_state);
>         ret = _genpd_set_performance_state(parent, parent_state, depth + 1);
>         if (ret)
>                 link->performance_state = link->prev_performance_state;
>
>         genpd_unlock(parent);
>
>         return ret;
> }
>
> static int _genpd_set_performance_state(struct generic_pm_domain *genpd,
>                                         unsigned int state, int depth)
> {
>         struct gpd_link *link = NULL;
>         int ret;
>
>         if (state == genpd->performance_state)
>                 return 0;
>
>         /* When scaling up, propagate to parents first in normal order */
>         if (state > genpd->performance_state) {
>                 list_for_each_entry(link, &genpd->child_links, child_node) {
>                         ret = _genpd_set_parent_state(genpd, link, state, depth);
>                         if (ret)
>                                 goto rollback_parents_up;
>                 }
>         }
>
>         if (genpd->set_performance_state) {
>                 pr_err("%s set performance state %d\n", genpd->name, state);
>                 ret = genpd->set_performance_state(genpd, state);
>                 if (ret) {
>                         if (link)
>                                 goto rollback_parents_up;
>                         return ret;
>                 }
>         }
>
>         /* When scaling down, propagate to parents after in reverse order */
>         if (state < genpd->performance_state) {
>                 list_for_each_entry_reverse(link, &genpd->child_links, child_node) {
>                         ret = _genpd_set_parent_state(genpd, link, state, depth);
>                         if (ret)
>                                 goto rollback_parents_down;
>                 }
>         }
>
>         genpd->performance_state = state;
>         return 0;
>
> rollback_parents_up:
>         list_for_each_entry_continue_reverse(link, &genpd->child_links, child_node)
>                 _genpd_rollback_parent_state(link, depth);
>         return ret;
> rollback_parents_down:
>         list_for_each_entry_continue(link, &genpd->child_links, child_node)
>                 _genpd_rollback_parent_state(link, depth);
>         return ret;
> }
>




[Index of Archives]     [Device Tree Compilter]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux PCI Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Yosemite Backpacking]


  Powered by Linux