Hi,
Some comments on the opportunistic suspend portion of this patch.
On Fri, 30 Apr 2010, Arve Hjønnevåg wrote:
> Adds /sys/power/policy that selects the behaviour of /sys/power/state.
> After setting the policy to opportunistic, writes to /sys/power/state
> become non-blocking requests that specify which suspend state to enter
> when no suspend blockers are active. A special state, "on", stops the
> process by activating the "main" suspend blocker.
>
> Signed-off-by: Arve Hjønnevåg <arve@xxxxxxxxxxx>
> ---
> Documentation/power/opportunistic-suspend.txt | 119 +++++++++++
> include/linux/suspend_blocker.h | 64 ++++++
> kernel/power/Kconfig | 16 ++
> kernel/power/Makefile | 1 +
> kernel/power/main.c | 92 ++++++++-
> kernel/power/power.h | 9 +
> kernel/power/suspend.c | 4 +-
> kernel/power/suspend_blocker.c | 261 +++++++++++++++++++++++++
> 8 files changed, 559 insertions(+), 7 deletions(-)
> create mode 100644 Documentation/power/opportunistic-suspend.txt
> create mode 100755 include/linux/suspend_blocker.h
> create mode 100644 kernel/power/suspend_blocker.c
While reading the patch, it seemed that this opportunistic suspend
mechanism is a degenerate case of a system-wide form of CPUIdle. I'll
call it "SystemIdle," for lack of a better term. The main difference
between this and the CPUIdle design is that this patch's SystemIdle is
not integrated with any of the existing Linux mechanisms for limiting
idle duration or depth. This is a major problem with the current
patch and should prevent it from being merged in its current state.
But if some SystemIdle-type code were to be written that did work with
the rest of the Linux kernel, it would be very useful.
To take an example from the current Linux-OMAP kernels: right now, the
Linux-OMAP code handles system-level idle through CPUIdle.[1] To
borrow ACPI terminology, some of the S-states are implemented as
C-states. That isn't right, for several reasons:
- There are other constraints on system-level idle modes beyond those
that apply to the CPU. To take an OMAP example, on boards that
support it, OMAP2+ chips can cut power to the external SoC
high-frequency clock oscillator ("HF oscillator").[2] This is the
clock that is later used to drive the CPU clock, bus clock, etc., on
the SoC, but can also be used to drive other chips on the board,
such as external audio codec chips, GPS receivers, etc.) If power
is cut to the HF oscillator, it can take several milliseconds to
stabilize once power is reapplied. Part of the decision as to
whether to cut power to the HF oscillator is a classic idle
balancing test between power economy and wakeup latency. But this
occurs on a system level - the CPU may not even be involved.
Consider a low-power audio playback use-case. The CPU may be only
rarely involved, but other devices on the SoC may need to be active.
For example, if a large number of audio samples are loaded into main
memory, the CPU can go into a very deep sleep state while the DMA
controller transfers the samples to the audio serial interface
device. But the DMA controller and audio serial interface need to
run occasionally, so depending on FIFO depths in the system, the HF
oscillator clock may need to be kept on, even if the CPU idle level
would suggest that it could be disabled.
Additionally, other external chips outside the SoC, may be clocked
by that HF oscillator. Continuing the low-power audio playback
use-case, the external audio codec chip may also be clocked from the
HF oscillator. If the HF oscillator is cut when the SoC is idle,
but the audio codec has samples in its buffer, audio playback will
be disrupted.
- There should be only one system-level governor. Since all of the
production OMAPs so far have been single-CPU systems, we've been
able to get away with this. But on multi-CPU systems, there is one
CPUIdle governor per CPU. So reusing the CPUIdle governor to do this
won't work the same way as it has for us in the past.
So some good SystemIdle-type code would be really useful for us. But a
significant part of what "good" means is that such a SystemIdle needs to
be driven from the bottom up, rather than from the top down. In other
words, both the 'policy' (i.e., the choice of metrics to use to determine
the system idle state), and the 'implementation of that policy' (i.e., the
way to enter those idle states), should be left up to the underlying chip
architecture code that would implement SystemIdle client drivers. (In
this regard, it would be even more different than CPUIdle, which hardcodes
the policy.)
This way, at least initially, there will be minimal top-down
restrictions on what architectures need to do to implement to their
chip's power management features. As common features are identified,
by individual architectures implementing working code, then the
architecture developers can collaborate, and common components can be
proposed for merging into the top-level SystemIdle code.
Such a SystemIdle implementation should also work for Android's
opportunistic suspend code. Google developers could choose the system
metrics that they wish to use to control system idle levels. If they
want to ignore timers and the scheduler, that's fine - that design
decision can be confined to their policy driver, and the suspend-block
code can be activated the moment that some driver sets a PM
constraint, rather than doing anything more refined. Similarly,
Google can change the way that the policy decisions are implemented.
Google could use a driver that does not take any advantage of
fine-grained power management aside from CPUIdle. This driver could
simply enter full system suspend whenever the policy driver authorizes
it to do.
This approach - or some similar approach - should allow Android to do what
it needs, while still allowing other, more finely-grained power management
approaches to do something different.
regards,
- Paul
_______________________________________________
linux-pm mailing list
linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/linux-pm