[linux-pm] Some thoughts on suspend/resume development

abelay at novell.com (Adam Belay) · Tue Mar 8 20:12:20 2005

On Tue, Mar 08, 2005 at 09:13:00AM -0800, Patrick Mochel wrote:
> 
> > An important difference between system sleep and selective suspend is
> > that with selective suspend, we generally expect the device to resume
> > on demand.  This demand may take the form of a request to the driver
> > (e.g., a block I/O request for a disk device) or a resume request from
> > the device itself (e.g., a notification from a mouse that has just
> > been moved).  This means that input queues must not be plugged and
> > device interrupts must remain enabled, exactly the opposite of what
> > happens during system sleep.  For this reason it is vital for drivers
> > to know whether a suspend call is invoking a system sleep or a
> > selective suspend.  Hence I propose that a new pm_message_t event code,
> > PMSG_SELECTIVE (or maybe PMSG_SELECTIVE_SUSPEND), be used for selective
> > suspends.
> 
> I +/- agree, though I think there also must be a way to completely suspend
> the device, like when you are doing a system suspend.

For runtime power management, we need a way of tracking device usage.  In
kernel timers could disable devices that aren't being used, and then re-enable
them when they are needed again.  This would have to be executed in such a way
that userspace isn't interrupted.  Each class of device would have a different
policy and a different method of determining usage.  A subsystem to help track
time since last usage of the device would be helpful.  I'm concerned that
updating the time after every read or access could have a negative impact on
performance.  It would be nice if we could discuss how this might be
implemented.

> 
> > With resume-on-demand implemented properly, a driver may decide that
> > it can suspend its device without bothering to suspend the device's
> > children.  This kind of decision should be left to individual drivers
> > and the PM core shouldn't try to enforce a "children must be suspended
> > before their parents" policy for selective suspends.

I'm not sure if I agree that a parent can be suspended without first suspending
its children.  In general, a parent device can be lowered in power only if the
context and operation of the child devices are maintained.  If the change in
state does not affect the operation of child devices, then it really isn't a
"suspend".

> 
> Also true, and even true for system suspend states. While some child
> devices may not support PM, a parent device could, and power down the
> entire bus. It's important that we do descendant-ancestor ordering
> correctly during system suspend transitions. For runtime transitions, we
> need a way for the driver of a parent device to return an error if its
> child devices aren't in a compatible state for it (the parent) to be
> suspended.
> 
> This would be doing something like partial-tree suspends, but I'm not sure
> if this is best done in the kernel or in userspace with a proper tool.

The basic strategy would be to lower the state of each child device, and then
when all children are in a lower state, lower the parent (the power domain) to
the least common denominator power state.  I think this would have to be done
in the kernel because userspace may not be able to operate during this
transition.

> 
> > A common problem facing all drivers that do auto suspend is how to set
> > the inactivity timeout.  Two possible answers are: add an attribute
> > file in the /sys/.../power directory (so different devices can have
> > different timeouts), or add a driver module parameter (so all devices
> > using the same driver will have the same timeout).

Each class could have its own policies, complete with timeout values.  In
sysfs, the user could select which policy should be used. (e.g. performance,
normal, powersave).

> > For user suspends (made through sysfs) the user may want to convey
> > arbitrary information to a driver, things like which clocks to turn
> > off, which power level to change to, and so on.  This information
> > will vary from driver to driver, and the PM core shouldn't even try to
> > impose any sort of structure on it.  I think the best approach will be
> > to pass to the driver a character pointer giving the data written to
> > /sys/.../power/state, so that users can send whatever they want just
> > by writing it to the file.  This means adding an additional field to
> > pm_message_t.
> 
> Uh, that would really suck. This would entail a string parser in every
> driver, which is what we wanted to get away from with sysfs. A better way
> would be to have a driver export a file with the specific features that it
> supports encoded in a meaningful and efficient way (i.e. a fixed-length
> string, character, or constant).

Agreed.

> 
> 
> 	Pat

Another concern I had is how to relate power states between devices.  The
most standard format seems to be D*, as it is used by PCI, ACPI, and others.
It's not uncommon for a child device to require the parent to be in a
given state for wake events etc.  If the child isn't using the same names
for power states, then how could this be possible?  Also how could a class
level policy interact with devices that use different state names?  I may
be in favor of only using D-states.

Finally, I'm not sure if I like the current "*probe", "*remove", "*suspend",
and "*resume" for runtime power management.  I think it may be better to do
something like the following:

*attach - allocates data structures, creates sysfs entries, prepares driver
	  to handle the hardware.

*start -  Sets up device resources and configures the hardware.
(mostly physical)

*open -   engages the hardware, and allows the class to use it.
(logical and physical)

*close -  disengages the hardware, and stops access
(logical and physical)

*stop -   disables the hardware at a physical level
(mostly physical)

*detach - tears down the driver and releases it from the "struct device"

*power -  A function that saves and restores states and transitions power
       -  It could take the current state and the new state as arguments.

The idea here is that if a device could be put into a lower state in which
it isn't operational, but still maintains configuration, then we could just
use "close" and "open".  For complete poweroffs we could do:

"close" -> "stop" and... "start" -> "open".

A power state has the following characteristics:
Is the device operational?
Is the context of the device maintained?
Is the configuration of the device maintained?

"start" and "stop" handle configuration, "open" and "close" handle context.

Just some thoughts.  I look forward to any reactions.

Thanks,
Adam