On Thu, 19 Jan 2023 at 19:42, Krzysztof Kozlowski <krzysztof.kozlowski@xxxxxxxxxx> wrote: > > The runtime Power Management of CPU topology is not compatible with > PREEMPT_RT: > 1. Core cpuidle path disables IRQs. > 2. Core cpuidle calls cpuidle-psci. > 3. cpuidle-psci in __psci_enter_domain_idle_state() calls > pm_runtime_put_sync_suspend() and pm_runtime_get_sync() which use > spinlocks (which are sleeping on PREEMPT_RT). > > Deep sleep modes are not a priority of Realtime kernels because the > latencies might become unpredictable. On the other hand the PSCI CPU > idle power domain is a parent of other devices and power domain > controllers, thus it cannot be simply skipped (e.g. on Qualcomm SM8250). > > Disable the runtime PM calls from cpuidle-psci, which effectively stops > suspending the cpuidle PSCI domain. This is a trade-off between making > PREEMPT_RT working and still having a proper power domain hierarchy in > the system. I think this sounds like a reasonable compromise, at least at this point. > > Cc: Adrien Thierry <athierry@xxxxxxxxxx> > Cc: Brian Masney <bmasney@xxxxxxxxxx> > Cc: linux-rt-users@xxxxxxxxxxxxxxx > Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@xxxxxxxxxx> > > --- > > Changes since v1: > 1. Re-work commit msg. > 2. Add note to Kconfig. > > Several other patches were dropped, as this is the only one actually > needed. It effectively stops PSCI cpuidle power domains from suspending > thus solving all other issues I experienced. I like this approach better, thanks! > --- > drivers/cpuidle/Kconfig.arm | 3 +++ > drivers/cpuidle/cpuidle-psci.c | 4 ++-- > 2 files changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/cpuidle/Kconfig.arm b/drivers/cpuidle/Kconfig.arm > index 747aa537389b..24429b5bfd1c 100644 > --- a/drivers/cpuidle/Kconfig.arm > +++ b/drivers/cpuidle/Kconfig.arm > @@ -24,6 +24,9 @@ config ARM_PSCI_CPUIDLE > It provides an idle driver that is capable of detecting and > managing idle states through the PSCI firmware interface. > > + The driver is not yet compatible with PREEMPT_RT: no idle states will > + be entered by CPUs on such kernel. This isn't entirely correct. In principle your suggested change ends up providing the below updated behaviour for PREEMPT_RT. *) If the idle states are described with the non-hierarchical layout, all idle states are still available. **) If the idle states are described with the hierarchical layout, only the idle states defined per CPU are available, but not the ones being shared among a group of CPUs (aka cluster idle states). Perhaps there is an easier way to summarize what I stated above? > + > config ARM_PSCI_CPUIDLE_DOMAIN > bool "PSCI CPU idle Domain" > depends on ARM_PSCI_CPUIDLE > diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c > index 312a34ef28dc..c25592718984 100644 > --- a/drivers/cpuidle/cpuidle-psci.c > +++ b/drivers/cpuidle/cpuidle-psci.c > @@ -66,7 +66,7 @@ static __cpuidle int __psci_enter_domain_idle_state(struct cpuidle_device *dev, > /* Do runtime PM to manage a hierarchical CPU toplogy. */ > if (s2idle) > dev_pm_genpd_suspend(pd_dev); > - else > + else if (!IS_ENABLED(CONFIG_PREEMPT_RT)) Rather than doing this (and the below) in __psci_enter_domain_idle_state(), I suggest replacing this with a bailout point in psci_dt_cpu_init_topology(). That would prevent the __psci_enter_domain_idle_state() from being called altogether, which is really what we need. Moreover, I think it would make sense to set the GENPD_FLAG_ALWAYS_ON for the corresponding genpd, when CONFIG_PREEMPT_RT is set. See psci_pd_init(). > pm_runtime_put_sync_suspend(pd_dev); > > state = psci_get_domain_state(); > @@ -77,7 +77,7 @@ static __cpuidle int __psci_enter_domain_idle_state(struct cpuidle_device *dev, > > if (s2idle) > dev_pm_genpd_resume(pd_dev); > - else > + else if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > pm_runtime_get_sync(pd_dev); > > cpu_pm_exit(); > -- Kind regards Uffe