[linux-pm] Some thoughts on suspend/resume development

abelay at novell.com (Adam Belay) · Tue Mar 8 22:02:17 2005

On Wed, 2005-03-09 at 16:44 +1100, Nigel Cunningham wrote:
> Hi!
> 
> On Wed, 2005-03-09 at 15:09, Adam Belay wrote:
> > On Tue, Mar 08, 2005 at 09:13:00AM -0800, Patrick Mochel wrote:
> > > 
> > > > An important difference between system sleep and selective suspend is
> > > > that with selective suspend, we generally expect the device to resume
> > > > on demand.  This demand may take the form of a request to the driver
> > > > (e.g., a block I/O request for a disk device) or a resume request from
> > > > the device itself (e.g., a notification from a mouse that has just
> > > > been moved).  This means that input queues must not be plugged and
> > > > device interrupts must remain enabled, exactly the opposite of what
> > > > happens during system sleep.  For this reason it is vital for drivers
> > > > to know whether a suspend call is invoking a system sleep or a
> > > > selective suspend.  Hence I propose that a new pm_message_t event code,
> > > > PMSG_SELECTIVE (or maybe PMSG_SELECTIVE_SUSPEND), be used for selective
> > > > suspends.
> > > 
> > > I +/- agree, though I think there also must be a way to completely suspend
> > > the device, like when you are doing a system suspend.
> > 
> > For runtime power management, we need a way of tracking device usage.  In
> > kernel timers could disable devices that aren't being used, and then re-enable
> > them when they are needed again.  This would have to be executed in such a way
> > that userspace isn't interrupted.  Each class of device would have a different
> > policy and a different method of determining usage.  A subsystem to help track
> > time since last usage of the device would be helpful.  I'm concerned that
> > updating the time after every read or access could have a negative impact on
> > performance.  It would be nice if we could discuss how this might be
> > implemented.
> 
> I think the simplest way to address this will be to let individual
> driver instances track their own usage. After all, who knows better
> whether the driver is being used and when it was last used than the
> driver itself? If the timer is set & reset by the driver, that should
> simplify things immensely. It can update the timer at sensible places,
> such as the completion of requests. In this way, drivers just need to be
> told policy. As far as users are concerned, the device can be always
> available; as far as parents and children go, messages can be passed
> between drivers to notify of events, and DBus can be used to notify
> userspace of events.

I agree that individual drivers should track their own usage.  I was
just wondering if we could provide some utility functions that would
help with this.

> > 
> > > 
> > > > A common problem facing all drivers that do auto suspend is how to set
> > > > the inactivity timeout.  Two possible answers are: add an attribute
> > > > file in the /sys/.../power directory (so different devices can have
> > > > different timeouts), or add a driver module parameter (so all devices
> > > > using the same driver will have the same timeout).
> > 
> > Each class could have its own policies, complete with timeout values.  In
> > sysfs, the user could select which policy should be used. (e.g. performance,
> > normal, powersave).
> 
> I like the per-instance idea better. If I have, for example, two
> harddrives on the same bus, I might have very different usage patterns
> for them. I might want one to spend most of it's time powered down, and
> the other to be always on.

I'm sorry I wasn't clear on this.  I also want the settings to be
per-instance, not per class.  I think that power management policy is a
class level issue, however.  In other words, a class could provide an
array of policies, and each device could individually be assigned one.

> 
> > > > For user suspends (made through sysfs) the user may want to convey
> > > > arbitrary information to a driver, things like which clocks to turn
> > > > off, which power level to change to, and so on.  This information
> > > > will vary from driver to driver, and the PM core shouldn't even try to
> > > > impose any sort of structure on it.  I think the best approach will be
> > > > to pass to the driver a character pointer giving the data written to
> > > > /sys/.../power/state, so that users can send whatever they want just
> > > > by writing it to the file.  This means adding an additional field to
> > > > pm_message_t.
> > > 
> > > Uh, that would really suck. This would entail a string parser in every
> > > driver, which is what we wanted to get away from with sysfs. A better way
> > > would be to have a driver export a file with the specific features that it
> > > supports encoded in a meaningful and efficient way (i.e. a fixed-length
> > > string, character, or constant).
> > 
> > Agreed.
> 
> <heresy> I wonder if using sysfs is even the best method for doing
> run-time PM. It will force your imaginary nice userspace interface to
> include code to scan the whole directory tree looking for files of each
> kind, perhaps sorting and collating and so on. Maybe a DBus interface
> would be better? </heresy>

So a request for suspending a device would be sent through the netlink
socket?  Could you provide a few more details on the alternative?

> 
> > Another concern I had is how to relate power states between devices.  The
> > most standard format seems to be D*, as it is used by PCI, ACPI, and others.
> > It's not uncommon for a child device to require the parent to be in a
> > given state for wake events etc.  If the child isn't using the same names
> > for power states, then how could this be possible?  Also how could a class
> > level policy interact with devices that use different state names?  I may
> > be in favor of only using D-states.
> > 
> > Finally, I'm not sure if I like the current "*probe", "*remove", "*suspend",
> > and "*resume" for runtime power management.  I think it may be better to do
> > something like the following:
> > 
> > *attach - allocates data structures, creates sysfs entries, prepares driver
> > 	  to handle the hardware.
> > 
> > *start -  Sets up device resources and configures the hardware.
> > (mostly physical)
> > 
> > *open -   engages the hardware, and allows the class to use it.
> > (logical and physical)
> > 
> > *close -  disengages the hardware, and stops access
> > (logical and physical)
> > 
> > *stop -   disables the hardware at a physical level
> > (mostly physical)
> > 
> > *detach - tears down the driver and releases it from the "struct device"
> > 
> > *power -  A function that saves and restores states and transitions power
> >        -  It could take the current state and the new state as arguments.
> > 
> > 
> > The idea here is that if a device could be put into a lower state in which
> > it isn't operational, but still maintains configuration, then we could just
> > use "close" and "open".  For complete poweroffs we could do:
> 
> I would argue that even for a complete poweroff, the device should
> maintain it's configuration (in normal memory), assuming that here you
> mean details like IRQ usage.

The driver will be aware of the configuration from normal memory, but
the values have to be set again in the hardware.  For example, firmware
for ipw2200 would have to be reloaded after a poweroff.  Also, with more
advanced resource management, assignments may change (rebalancing etc).
The device doesn't retain anything the driver has told it before the
poweroff.

> 
> Regards,
> 
> Nigel
> 

Thanks,
Adam