On Mon, Apr 30, 2012 at 1:09 PM, Colin Cross <ccross@xxxxxxxxxxx> wrote: > On some ARM SMP SoCs (OMAP4460, Tegra 2, and probably more), the > cpus cannot be independently powered down, either due to > sequencing restrictions (on Tegra 2, cpu 0 must be the last to > power down), or due to HW bugs (on OMAP4460, a cpu powering up > will corrupt the gic state unless the other cpu runs a work > around). Each cpu has a power state that it can enter without > coordinating with the other cpu (usually Wait For Interrupt, or > WFI), and one or more "coupled" power states that affect blocks > shared between the cpus (L2 cache, interrupt controller, and > sometimes the whole SoC). Entering a coupled power state must > be tightly controlled on both cpus. > > The easiest solution to implementing coupled cpu power states is > to hotplug all but one cpu whenever possible, usually using a > cpufreq governor that looks at cpu load to determine when to > enable the secondary cpus. This causes problems, as hotplug is an > expensive operation, so the number of hotplug transitions must be > minimized, leading to very slow response to loads, often on the > order of seconds. > > This patch series implements an alternative solution, where each > cpu will wait in the WFI state until all cpus are ready to enter > a coupled state, at which point the coupled state function will > be called on all cpus at approximately the same time. > > Once all cpus are ready to enter idle, they are woken by an smp > cross call. At this point, there is a chance that one of the > cpus will find work to do, and choose not to enter suspend. A > final pass is needed to guarantee that all cpus will call the > power state enter function at the same time. During this pass, > each cpu will increment the ready counter, and continue once the > ready counter matches the number of online coupled cpus. If any > cpu exits idle, the other cpus will decrement their counter and > retry. > > To use coupled cpuidle states, a cpuidle driver must: > > Set struct cpuidle_device.coupled_cpus to the mask of all > coupled cpus, usually the same as cpu_possible_mask if all cpus > are part of the same cluster. The coupled_cpus mask must be > set in the struct cpuidle_device for each cpu. > > Set struct cpuidle_device.safe_state to a state that is not a > coupled state. This is usually WFI. > > Set CPUIDLE_FLAG_COUPLED in struct cpuidle_state.flags for each > state that affects multiple cpus. > > Provide a struct cpuidle_state.enter function for each state > that affects multiple cpus. This function is guaranteed to be > called on all cpus at approximately the same time. The driver > should ensure that the cpus all abort together if any cpu tries > to abort once the function is called. > > This series has been tested by implementing a test cpuidle state > that uses the parallel barrier helper function to verify that > all cpus call the function at the same time. > > This patch set has a few disadvantages over the hotplug governor, > but I think they are all fairly minor: > * Worst-case interrupt latency can be increased. If one cpu > receives an interrupt while the other is spinning in the > ready_count loop, the second cpu will be stuck with > interrupts off until the first cpu finished processing > its interrupt and exits idle. This will increase the worst > case interrupt latency by the worst-case interrupt processing > time, but should be very rare. > * Interrupts are processed while still inside pm_idle. > Normally, interrupts are only processed at the very end of > pm_idle, just before it returns to the idle loop. Coupled > states requires processing interrupts inside > cpuidle_enter_state_coupled in order to distinguish between > the smp_cross_call from another cpu that is now idle and an > interrupt that should cause idle to exit. > I don't see a way to fix this without either being able to > read the next pending irq from the interrupt chip, or > querying the irq core for which interrupts were processed. > * Since interrupts are processed inside cpuidle, the next > timer event could change. The new timer event will be > handled correctly, but the idle state decision made by > the governor will be out of date, and will not be revisited. > The governor select function could be called again every time, > but this could lead to a lot of work being done by an idle > cpu if the other cpu was mostly busy. > > v2: > * removed the coupled lock, replacing it with atomic counters > * added a check for outstanding pokes before beginning the > final transition to avoid extra wakeups > * made the cpuidle_coupled struct completely private > * fixed kerneldoc comment formatting > * added a patch with a helper function for resynchronizing > cpus after aborting idle > * added a patch (not for merging) to add trace events for > verification and performance testing > > v3: > * rebased on v3.4-rc4 by Santosh > * fixed decrement in cpuidle_coupled_cpu_set_alive > * updated tracing patch to remove unnecessary debugging so > it can be merged > * made tracing _rcuidle > > This series has been tested and reviewed by Santosh and Kevin > for OMAP4, which has a cpuidle series ready for 3.5, and Tegra > and Exynos5 patches are in progress. I think this is ready to > go in. Lean, are you maintaining a cpuidle tree for linux-next? Sorry, *Len. > If not, I can publish a tree for linux-next, or this could go in > through Arnd's tree. _______________________________________________ linux-pm mailing list linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/linux-pm