[linux-pm] Toward runtime power management in Linux

mochel at digitalimplant.org (Patrick Mochel) · Mon Aug 1 17:58:52 2005

Good summary, though you're missing a few key points and issues. Details
below, but let's begin with this.

We identified three facets of Runtime Power Management:

- Automatic (or Dynamic) Suspend

  This is where a device will enter a low power state after a certain
  amount of time has elapsed without any activity.

  The amount of time should be configurable on a per-device basis through
  a sysfs interface.

  What defines "inactivity" is driver-dependent, and we expect to be
  class-dependent (some classes may define inactivity as the device not
  being opened; others may define it as having no read()/write() calls;
  still others may define it as not having any network packets besides
  ARP requests; you get the idea..) IOW, it doesn't matter to the core.

  The device should enter a low-power state that it supports. It doesn't
  matter what state this is, and it's debatable whether or not it should
  be exported/configurable via sysfs. (It's possible that a userspace-
  based policy may wish to adjust this based on its extra knowledge about
  system features/bugs, or based on the desired latency/consumption
  tradeoff for the particular system or configuration (plugged, unplugged,
  etc).

  The device must be powered back on when a new request comes in. Again,
  whether this is on open(), read(), or socket() is up to the class and
  the drivers of that class.

- Directed Suspend

  This where a user or an application specifies a device to enter a lower
  power state.

  The states that a device exports (allows to be set) should be exported.
  The application should specify which state to enter.

  By default, the device may not respond to I/O requests. There must be a
  flag exported to control this.

- Performance

  (This was not covered in detail, but based on the specified interface
  above, it's pretty easy.)

  The states a device (and its driver) support should be exported via
  sysfs.

  A user or application specifies which state to enter.

  This may turn on/off different functional components of the device. The
  drivers need to be prepared to handle that.

To address your points directly:

On Sat, 30 Jul 2005, Alan Stern wrote:

> 	Basic ideas
>
> To start with, we've already agreed that drivers should maintain some
> set of states for their devices.  What these states should be, how
> they are implemented and exported to userspace, and whether they are
> defined by the device driver or by the bus subsystem, are all beyond
> the scope of this document.  We only need to know that there is a
> userspace API (presumably involving sysfs) for requesting state
> changes.

The definition is beyond the scope, but the interface for manipulating
them, both internally and externally, is well-within the scope. There must
be a simple interface for drivers to export this information to userspace.

To get this interface, we need to know (tangentially, not in this thread)
what these states are for various devices and what they do.

[ Yes, we need to focus on the details of the General Core, but in order
to do correctly, we need all of the gruesome details of the devices we
must support. ]

> We have also agreed that runtime power state changes need to bubble up
> the device tree.  To handle this, drivers for interior nodes in the
> tree can define "link states".

Ah, shame on you - a USBism. Unfortunately, it doesn't make much sense to
those not familiar with the context.

> When a driver changes a device's state, it will notify all of the
> power parents about link state changes while doing so.

> These notifications are one-way, child-to-parent only.  We don't need
> pre- and post-notifications; each message will inform the parent of a
> single link-state change, which the parent will then carry out.

This may not be true. There could be a race between children notifying
their parent of state changes if e.g. the states of two children are
opposite and in the process of inverting (there is one suspended one that
tries to resume, and one resumed device that tries to suspend).

I haven't thought this all the way through, so it might just represent a
glitch in performance rather than a potential error condition. Regardless,
it's an implementation detail that I don't want to get bogged down with
right now.

> 	Practical considerations
>
> Power-parent relations: How should we represent the extra power
> parent-child relationships that aren't present in the device tree?
> Would it be enough to give each device that needs it a subdirectory in
> sysfs with symlinks to its power parents?  Do we also need a symlink
> from each parent to the child?

We decided in Ottawa to create a completely seperate hierarchy for power
relationships. This will coexist with the device tree and will, by virtue
of the kernel infrastructure, be automatically represented in sysfs. There
will of course be symlinks connecting the physical tree with the power
tree.

The fact that some devices will have multiple ancestral dependencies is a
problem that we must solve with the right data structure. I'm not sure of
the best way to do it, but it shouldn't be too hard.

> RTPM core: The scheme described above doesn't necessarily involve the
> PM core.  The notifications can be simple subroutine calls, perhaps
> with support from the bus subsystem.  It's not obvious how much core
> support we will need for RTPM, apart from the sysfs interface.

Exactly. All of this is device-, driver-, bus-, and class- dependent. It
is not the core's job to factor in every case to its infrastructure.
Instead, it should serve as a

	a) Library for subsystems to use for common tasks
	b) conduit for information between subsystems and sysfs

> Recursion: A consequence of doing things this way is that the
> notifications can potentially use a lot of stack space as they
> progress up the device tree.

This is an important design consideration.

> Order of locking: The general rule, needed to prevent deadlock, for
> acquiring the device semaphores is this: Don't lock a device if you
> already hold one of its descendants' locks.

So is this.

These are limitations to what we have now, but it doesn't have to be that
way. Ideally, we will have a solution that works flawlessly under the
current constraints. Otherwise, we should (very carefully) examine ways in
which we can adjust the current Core to better handle what we need to do.
I don't want to redesign the Driver Core on a whim; only under specific
requirements.

> Context: A relatively recent change to the driver model core added a
> semaphore to struct device, and we will want to hold this semaphore
> while making state changes.  This immediately implies that RTPM needs
> a process context to run in.

Absolutely. We're going to be in process context when receiving requests
from sysfs, too.

> Idle-timeout RTPM: We certainly should have an API whereby userspace
> can inform the kernel of an idle-timeout value to use for
> autosuspending.  (In principle there could be multiple timeout values,
> for successively deeper levels of power saving.)  This cries out to be
> managed in a centralized way rather than letting each driver have its
> own API.  It's not so clear what the most efficient implementation
> will be.  Should every device have its own idle-timeout kernel timer?
> (That's a lot of kernel timers.)  Or should the RTPM kernel thread
> wake up every second to scan a list of devices that may have exceeded
> their idle timeouts?

This is definitely a design consideration. However, I don't want to design
any infrastructure like this now because we don't know how people are
going to use it. We want some early adopters to start working with the
concepts and implementing their own versions. Once we get an idea of how
to abstract it, maybe we can start providing some library functions.

> Userspace support: It's easy to see how userspace could use sysfs to
> request a single device state change.  But what if the user wants to
> suspend an entire subtree?

We need some sort of support for it. I'm inclined to say that it must go
in the kenrel because of ordering and locking considerations. But, it's
too early to comment specifically on it.

That's it for now.

Thanks,

	Pat