Hi Folks ! I think we are going nowhere on linux-pm at the moment, we won't solve world hunger with endless discussions aimed toward designing the ultimate scheme that will solve all problems at once. Besides, we haev plenty of other issues beside the driver callback to deal with if we want at least reliable system suspend. What about changing the whole thing in a way a bit closer to what darwin does and in a way that will preserve existing drivers, making a smoother transition (basically defining a _new_ callback) ? that means deciding suspend() and resume() callbacks are obsolete but still supported by a default mecanism when the new stuff is not there while adding a new set_power_state() callback. The transition would be easy as not breaking existing drivers. The idea is simple. Drivers describe their possible power states in an array, which contains: - state name (ascii) for use by sysfs (see below) - some flags (for later use mostly) - matching system state - additional infos (see below), that is struct can be extended for later uses. set_power_state() is called to get to a state (passing the state index in the array), along with a possible argument giving indications on the cause of the state transition (our former flags) the matching system state is used to find an appropriate driver state for a system sleep state. System sleep states are globally defined, for now, they would be PM_SYSTEM_RUNNING, PM_SYSTEM_IDLE, PM_SYSTEM_SUSPENDED. They are defined as bits in a mask so a driver can match several system states to one power state. RUNNING can be 0. Eventually, we can add a field for 'parent bus states', which similar to system state, is defined in the context of the parent bus, though we need to check how we define the matching here (and of both system & parent state ? or ?) To finish with this state structure, let's say we can in the future add things like power requirements for the state etc.. in there. struct device would then hold the current state (this is really helpful to haev it, and now, we have a clear meaning for it: it's an index in the driver provided table). Drivers without a table would just "map" to a globally defined "simple" one that does boolean suspend/resume and a global set_power_state() that calls suspend and resume (with always 3 as suspend() argument which is what works now, unless we want also to have 4 around for STD...). The notion of "freezing" is easy here too, it's implicit from the state. A driver state mapping to a system suspend state implies freeze, while other states are driver dependent. The "flags" can be useful just for that too: indicating when a state implies a functional freeze (driver not operational), or when it implies auto-resume. The above states and their attributes can be exposed to userland in sysfs via a power_states entry, while the "power" entry would be written to with the state name... to avoid some possibly dangerous cases, we can even have a flag indicating that a given state is not to be called from userland. If userland triggers a frozen state on a critical device (like swap), then it's just root shotting himself in the foot, so I don't really care as long as the information is available for userland to take the appropriate policy decision. Now, regarding partial tree (that is wether state has to be propagated down to child devices), that is the last remaining question in fact that my mecanism doesn't solve, unless we add this notion of "bus states" in addition to system states that I talked about earlier. The idea here is that the drivers can match on a bus state (as explained above) and can provide a bus state (2 different fields in the per-state structure). So upon state change, we can propagate down by doing a new matching of childs based on the parent new state. The actual state bits beeing defined for a given bus type only, we are totally flexible here. For example, USB can defined active & suspend bus states. PCI can defined as active, clock lost, D3 power, off, etc.. but we don't need to define everything right now, that's the cool thing. In a first implementation, we wouldn't have partial tree suspend, so those fields may not be used right away. but I think we should consider going that way. I also think we shouldn't anymore move things around lists. We walk the PM tree, that's it. Since we know the state of a given device at any time, it's rather easy to walk it, even in a non-recursive code (using a local stack evetnually as a list). That would make it easy to implement the broadcast of bus states to childs. Ok, enough for now. I think this is rather easy to implement, flexible enough to cover pretty much all needs but the most exotics (which can always be "hacked"), and will keep existing drivers working by simply, in the absence of set_power_state(), defaulting on a best-effort basis to suspend/resume. Comments ? Ben.