Re: [PATCH 1/8] PM: Add suspend block api.

Paul Walmsley <paul@xxxxxxxxx> · Thu, 13 May 2010 13:01:33 -0600 (MDT)

Hi,

Some comments on the opportunistic suspend portion of this patch.

On Fri, 30 Apr 2010, Arve Hjønnevåg wrote:

> Adds /sys/power/policy that selects the behaviour of /sys/power/state.
> After setting the policy to opportunistic, writes to /sys/power/state
> become non-blocking requests that specify which suspend state to enter
> when no suspend blockers are active. A special state, "on", stops the
> process by activating the "main" suspend blocker.
> 
> Signed-off-by: Arve Hjønnevåg <arve@xxxxxxxxxxx>
> ---
>  Documentation/power/opportunistic-suspend.txt |  119 +++++++++++
>  include/linux/suspend_blocker.h               |   64 ++++++
>  kernel/power/Kconfig                          |   16 ++
>  kernel/power/Makefile                         |    1 +
>  kernel/power/main.c                           |   92 ++++++++-
>  kernel/power/power.h                          |    9 +
>  kernel/power/suspend.c                        |    4 +-
>  kernel/power/suspend_blocker.c                |  261 +++++++++++++++++++++++++
>  8 files changed, 559 insertions(+), 7 deletions(-)
>  create mode 100644 Documentation/power/opportunistic-suspend.txt
>  create mode 100755 include/linux/suspend_blocker.h
>  create mode 100644 kernel/power/suspend_blocker.c

While reading the patch, it seemed that this opportunistic suspend
mechanism is a degenerate case of a system-wide form of CPUIdle.  I'll
call it "SystemIdle," for lack of a better term.  The main difference
between this and the CPUIdle design is that this patch's SystemIdle is
not integrated with any of the existing Linux mechanisms for limiting
idle duration or depth.  This is a major problem with the current
patch and should prevent it from being merged in its current state.
But if some SystemIdle-type code were to be written that did work with
the rest of the Linux kernel, it would be very useful.

To take an example from the current Linux-OMAP kernels: right now, the
Linux-OMAP code handles system-level idle through CPUIdle.[1] To
borrow ACPI terminology, some of the S-states are implemented as
C-states.  That isn't right, for several reasons:

- There are other constraints on system-level idle modes beyond those
  that apply to the CPU.  To take an OMAP example, on boards that
  support it, OMAP2+ chips can cut power to the external SoC
  high-frequency clock oscillator ("HF oscillator").[2] This is the
  clock that is later used to drive the CPU clock, bus clock, etc., on
  the SoC, but can also be used to drive other chips on the board,
  such as external audio codec chips, GPS receivers, etc.)  If power
  is cut to the HF oscillator, it can take several milliseconds to
  stabilize once power is reapplied.  Part of the decision as to
  whether to cut power to the HF oscillator is a classic idle
  balancing test between power economy and wakeup latency.  But this
  occurs on a system level - the CPU may not even be involved.

  Consider a low-power audio playback use-case.  The CPU may be only
  rarely involved, but other devices on the SoC may need to be active.
  For example, if a large number of audio samples are loaded into main
  memory, the CPU can go into a very deep sleep state while the DMA
  controller transfers the samples to the audio serial interface
  device.  But the DMA controller and audio serial interface need to
  run occasionally, so depending on FIFO depths in the system, the HF
  oscillator clock may need to be kept on, even if the CPU idle level
  would suggest that it could be disabled.

  Additionally, other external chips outside the SoC, may be clocked
  by that HF oscillator.  Continuing the low-power audio playback
  use-case, the external audio codec chip may also be clocked from the
  HF oscillator.  If the HF oscillator is cut when the SoC is idle,
  but the audio codec has samples in its buffer, audio playback will
  be disrupted.

- There should be only one system-level governor.  Since all of the
  production OMAPs so far have been single-CPU systems, we've been
  able to get away with this.  But on multi-CPU systems, there is one
  CPUIdle governor per CPU.  So reusing the CPUIdle governor to do this
  won't work the same way as it has for us in the past.

So some good SystemIdle-type code would be really useful for us.  But a 
significant part of what "good" means is that such a SystemIdle needs to 
be driven from the bottom up, rather than from the top down. In other 
words, both the 'policy' (i.e., the choice of metrics to use to determine 
the system idle state), and the 'implementation of that policy' (i.e., the 
way to enter those idle states), should be left up to the underlying chip 
architecture code that would implement SystemIdle client drivers.  (In 
this regard, it would be even more different than CPUIdle, which hardcodes 
the policy.)

This way, at least initially, there will be minimal top-down
restrictions on what architectures need to do to implement to their
chip's power management features.  As common features are identified,
by individual architectures implementing working code, then the
architecture developers can collaborate, and common components can be
proposed for merging into the top-level SystemIdle code.

Such a SystemIdle implementation should also work for Android's
opportunistic suspend code.  Google developers could choose the system
metrics that they wish to use to control system idle levels.  If they
want to ignore timers and the scheduler, that's fine - that design
decision can be confined to their policy driver, and the suspend-block
code can be activated the moment that some driver sets a PM
constraint, rather than doing anything more refined.  Similarly,
Google can change the way that the policy decisions are implemented.
Google could use a driver that does not take any advantage of
fine-grained power management aside from CPUIdle.  This driver could
simply enter full system suspend whenever the policy driver authorizes
it to do.

This approach - or some similar approach - should allow Android to do what 
it needs, while still allowing other, more finely-grained power management 
approaches to do something different.

regards,

- Paul
_______________________________________________
linux-pm mailing list
linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/linux-pm