[linux-pm] Nested suspends; messages vs. states

stern at rowland.harvard.edu (Alan Stern) · Mon Mar 21 12:11:58 2005

Here are a couple of issues I want to raise before the next IRC session.

Nested suspends:  We know that the PM core tries to avoid increasing a
device's suspend level (i.e., FREEZE -> SUSPEND) as part of a system
sleep.  However...  The core won't have a very good idea of a device's
initial state, and a device may already be suspended when the system sleep
begins.  We have decided that devices' power states are represented by
pointers to structures defined at the bus or device level; the PM core
won't know how to interpret them.  So it won't know whether a device is
already suspended.

There's also the possibility that as part of runtime power management, a
user might tell an already-suspended device to go to a different, but
still suspended, power state.  The core can't filter out such requests
because it doesn't understand the states.  It's not even clear that such
requests _should_ be filtered out.  PM-aware PCI devices, for example,
have no trouble moving from D1 to D2.

The simplest way of handling this is to allow explicitly for such
possibilities.  When a device is asked to go from a very-low-power state
to a slightly-low-power state, it should be legal for the driver to leave
it in the very-low-power state.  It should also be legal for the driver to
go to full power temporarily, then down to the requested power level.  In
particular, if a device is already suspended then it should be okay for
the driver to do nothing and still return Success for a FREEZE or SUSPEND
request -- and this fact should be documented.

Another way to handle this is to include a generic "low power" flag as a
standard part of the new power-state structures.  That way the core would
at least know whether a device was at full power.  (Maybe include a
"quiescent" flag too, since some devices can be operational while at low
power.)  While this isn't a bad idea, I rather favor the other approach.
of course we can always do both.

Messages vs. states:  At the moment the PM core seems to be pretty
confused over this distinction.  Right in the definition of struct
dev_pm_info we have:

	pm_message_t		power_state;

Obviously a message isn't the same thing as a state.  This looks like 
something that will need to be changed in a lot of drivers when we 
introduce the new notion of a power state.

As a corollary we have the problem of what to include in the argument
passed to a suspend callback.  It should be a message, clearly, and 
part of the message should be an indication of which state to go to.  The
question is, how is this state represented?  For device power management
we will want to provide a genuine power state (i.e., pointer to bus- or
device-specific structure).  For system power management we will want to
provide a generic code -- PMSG_ON, PMSG_FREEZE, or PMSG_SUSPEND -- which
the driver will map to a real power state.

It seems to me the best way to do this is to let pm_message_t include both
a generic code and a power-state pointer.  There should be a new code
added (PMSG_RUNTIME? or maybe PMSG_DEVICE?), meaning that the driver
should use the state pointer.  Otherwise the driver maps the generic code.

Alan Stern