[...] > > > > > > Here are the two commits with the my current DT changes (WIP): > > > - MSM8909+PM8909 (RPMPD only): > > > https://github.com/msm8916-mainline/linux/commit/791e0c5a3162372a0738bc7b0f4a5e87247923db > > > > Okay, so this looks pretty straight forward. One thing though, it > > looks like we need to update the DT bindings for cpus. > > > > I recently updated Documentation/devicetree/bindings/arm/cpus.yaml > > [1], to let "perf" be the common "power-domain-name" for a CPU's SCMI > > performance-domain. I look like we should extend the description to > > allow "perf" to be used for all types of performance domains. > > > > "perf" sounds fine for a single power domain, I just used "apc" here for > consistency with the MSM8916 changes (which scales this power domain and > several others, as you saw). > > (BTW, I would appreciate such a generic name for the cpuidle case as > well, so "idle" instead of "psci" vs "sbi". I have another WIP cpuidle > driver and didn't want to invent another name there...) Whether it's "idle" or "power" or something else, we should certainly avoid a provider specific (psci) name, as has been pointed out earlier by Rob too. I will try to get some time to update the DT docs as soon as I can. Unless you get to it first, feel free to do it. > > > > - MSM8916 (CPR+RPMPD): > > > https://github.com/msm8916-mainline/linux/commit/8880f39108206d7a60a0a8351c0373bddf58657c > > > > This looks a bit odd to me. Does a CPU really have four different > > power-domains, where three of them are performance-domains? > > > > Good question. I think we're largely entering "uncharted territory" with > these questions, I can just try to answer it the best I can from the > limited documentation and knowledge I have. :) > > The CPU does indeed use four different power domains. There also seem to > be additional power switches that gate power for some components without > having to turn off the entire supply. > > I'll list them twice from two points of view: Once mapping component -> > power domain, then again showing each power domain separately to make it > more clear. At the end I also want to make clear that MSM8909 (with the > "single" power domain) is actually exactly the same SoC design, just > with different regulators supplying the power domains. > > It's totally fine if you just skim over it. I'm listing it in detail > also as reference for myself. :D > > # Components > - SoC > - CPU subsystem ("APPS") > - CPU cluster > - 4x CPU core (logic and L1 cache) -> VDD_APC > - Shared L2 cache > - Logic -> VDD_APC > - Memory -> VDD_MX > - CPU clock controller (logic) -> VDD_CX > - Provides CPU frequency from different clock sources > - L2 cache runs at 1/2 of CPU frequency > => Both VDD_APC and VDD_MX must be scaled based on frequency > - CPU PLL clock source > - Generates the higher (GHz) CPU frequencies > - Logic (?, unsure) -> VDD_CX > - ??? -> VDD_SR2_APPS_PLL > => VDD_CX must be scaled based on PLL frequency > > # Power Domains > ## VDD_APC > - dedicated for CPU > - powered off completely in deepest cluster cpuidle state > > - per-core power switch (per-core cpuidle) > - CPU logic > - L1 cache controller/logic and maybe memory(?, unsure) > - shared L2 cache controller/logic > > => must be scaled based on CPU frequency > > ## VDD_MX > - global SoC power domain for "on-chip memories" > - always on, reduced to minimal voltage when entire SoC is idle > > - power switch (controlled by deepest cluster cpuidle state?, unsure) > - L2 cache memory > > => must be scaled based on L2 frequency (=> 1/2 CPU frequency) > > ## VDD_CX > - global SoC power domain for "digital logic" > - always on, reduced to minimal voltage when entire SoC is idle > - voting for VDD_CX in the RPM firmware also affects VDD_MX performance > state (firmware implicitly sets VDD_MX >= VDD_CX) > > - CPU clock controller logic, CPU PLL logic(?, unsure) > > => must be scaled based on CPU PLL frequency > > ## VDD_SR2_APPS_PLL > - global SoC power domain for CPU clock PLLs > - on MSM8916: always on with constant voltage > > => ignored in Linux at the moment > > # Power Domain Regulators > These power domains are literally input pins on the SoC chip. In theory > one could connect any suitable regulator to each of those. In practice > there are just a couple of standard reference designs that everyone > uses: > > ## MSM8916 (SoC) + PM8916 (PMIC) > We need to scale 3 power domains together with cpufreq: > > - VDD_APC (CPU logic) = &pm8916_spmi_s2 (via CPR) > - VDD_MX (L2 memory) = &pm8916_l3 (via RPMPD: MSM8916_VDDMX) > - VDD_CX (CPU PLL) = &pm8916_s1 (via RPMPD: MSM8916_VDDCX) > > ## MSM8909 (SoC) + PM8909 (PMIC) > We need to scale 1 power domain together with cpufreq: > > - VDD_APC = VDD_CX = &pm8909_s1 (via RPMPD: MSM8909_VDDCX) > (CPU logic, L2 logic and CPU PLL) > (- VDD_MX (L2 memory) = &pm8909_l3 (RPM firmware enforces VDD_MX >= VDD_CX)) > > There is implicit magic in the RPM firmware here that saves us from > scaling VDD_MX. VDD_CX/APC are the same power rail. > > ## MSM8909 (SoC) + PM8916 (PMIC) > When MSM8909 is paired with PM8916 instead of PM8909, the setup is > identical to MSM8916+PM8916. We need to scale 3 power domains. > > > In a way it sounds like an option could be to hook up the cpr to the > > rpmpd:s instead (possibly even set it as a child-domains to the > > rpmpd:s), assuming that is a better description of the HW, which it > > may not be, of course. > > Hm. It's definitely an option. I must admit I haven't really looked > much at child-domains so far, so spontaneously I'm not sure about > the implications, for both the abstract hardware description and > the implementation. > > There seems to be indeed some kind of relation between MX <=> CX/APC: > > - When voting for CX in the RPM firmware, it will always implicitly > adjust the MX performance state to be MX >= CX. > > - When scaling APC up, we must increase MX before APC. > - When scaling APC down, we must decrease MX after APC. > => Clearly MX >= APC. Not in terms of raw voltage, but at least for the > abstract performance state. > > Is this some kind of parent-child relationship between MX <=> CX and > MX <=> APC? Thanks for sharing the above. Yes, to me, it looks like there is a parent/child-domain relationship that could be worth describing/using. > > If yes, maybe we could indeed bind MX to the CPR genpd somehow. They use > different performance state numbering, so we need some kind of > translation. I'm not entirely sure how that would be described. Both the power-domain and the required-opps DT bindings (Documentation/devicetree/bindings/opp/opp-v2-base.yaml) are already allowing us to describe these kinds of hierarchical dependencies/layouts. In other words, to scale performance for a child domain, the child may rely on that we scale performance for the parent domain too. This is already supported by genpd and through the opp library - so it should just work. :-) > > Scaling VDD_CX for the PLL is more complicated. APC <=> CX feel more > like siblings, so I don't think it makes sense to vote for CX as part of > the CPR genpd. Spontaneously I would argue scaling CX belongs into the > CPU PLL driver (since that's what the vote is for). However, for some > reason it was decided to handle such votes on the consumer side (here = > cpufreq) on mainline [1]. > > [1]: https://lore.kernel.org/linux-arm-msm/20200910162610.GA7008@xxxxxxxxxxx/ Right. I assume the above works just fine, so probably best to stick to that for this case too. [...] > > > > *) The approach you have taken in the $subject patch with the call to > > pm_runtime_resume_and_get() works as a fix for QCS404, as there is > > only the CPR to attach to. The problem with it, is that it doesn't > > work for cases where the RPMPD is used for performance scaling, either > > separate or in combination with the CPR. It would keep the rpmpd:s > > powered-on, which would be wrong. In regards to the > > dev_pm_syscore_device() thingy, this should not be needed, as long as > > we keep the vdd-apc-supply enabled, right? > > > > **) A more generic solution, that would work for all cases (even > > when/if hooking up the CPR to the rpmpd:s), consists of tweaking genpd > > to avoid "caching" performance states for these kinds of devices. And > > again, I don't see that we need dev_pm_syscore_device(), assuming we > > manage the vdd-apc-supply correctly. > > > > Did I miss anything? > > > > We do need to keep the CPU-related RPMPDs always-on too. > > Keeping the CPU-related RPMPDs always-on is a bit counter-intuitive, but > it's because of this: > > > > > > - RPMPD: This is the generic driver for all the SoC power domains > > > > > managed by the RPM firmware. It's not CPU-specific. However, as > > > > > special feature each power domain is exposed twice in Linux, e.g. > > > > > "MSM8909_VDDCX" and "MSM8909_VDDCX_AO". The _AO ("active-only") > > > > > variant tells the RPM firmware that the performance/enable vote only > > > > > applies when the CPU is active (not in deep cpuidle state). > > The CPU only uses the "_AO"/active-only variants in RPMPD. Keeping these > always-on effectively means "keep the power domain on as long as the CPU > is active". > > I hope that clears up some of the confusion. :) Yes, it does, thanks! Is the below the correct conclusion for how we could move forward then? *) The pm_runtime_resume_and_get() works for QCS404 as a fix. It also works fine when there is only one RPMPD that manages the performance scaling. **) In cases where we have multiple PM domains to scale performance for, using pm_runtime_resume_and_get() would work fine too. Possibly we want to use device_link_add() to set up suppliers, to avoid calling pm_runtime_resume_and_get() for each and every device. ***) Due to the above, we don't need a new mechanism to avoid "caching" performance states for genpd. At least for the time being. Kind regards Uffe