On Sat, 5 Mar 2005, Alan Stern wrote: > Up to now, the PM development effort has been concerned primarily with > system-wide sleep transitions, things like Suspend-To-RAM (STR) and > Suspend-To-Disk (STD). (A more general, less PC-centric description > would call these states "deep sleep" and "shallow sleep". A third > possible state, which some people might be in favor of, is Standby or > "very shallow sleep".) Ugh. I see there is still disagreement about naming. What type of platform uses that naming scheme? I've always been under the impression that STR, STD, and Standby were generic names; at least that is what has been stated in the code and email for ~4 years.. > Now it's time to consider how to implement additional power-saving > measures -- in other words, selective suspends. This has commonly been referred to as "Runtime Power Management" or more generally "Device Power Management" (as opposed to "System Power Management"). > A common problem for all selective suspends is that, unlike system > sleeps, they can occur at any time. Drivers will get very confused > unless we can guarantee somehow that, at a minimum, they will not > receive a suspend or resume call for a device while its probe or > release routine is running. That's a good point. In general, a driver should only get suspend/resume calls when it's bound to a device, which is technically after ->probe() and before ->release(). This means that the interface for controlling power state should only be exposed during that time. (Currently the power/ directory is added when the (physical) device is registered.) We can change this by not adding the power/ directory (and associated files) until after the driver is bound. But, I think a better solution would be for the bus subsystems to add/remove the power control files, since it a) knows when the driver is bound/unbound and b) is likely to have a bus-specific interface (like the name and number of power states to enter). This would also easily allow the bus to provide a default power interface for devices that are not bound to drivers. > An important difference between system sleep and selective suspend is > that with selective suspend, we generally expect the device to resume > on demand. This demand may take the form of a request to the driver > (e.g., a block I/O request for a disk device) or a resume request from > the device itself (e.g., a notification from a mouse that has just > been moved). This means that input queues must not be plugged and > device interrupts must remain enabled, exactly the opposite of what > happens during system sleep. For this reason it is vital for drivers > to know whether a suspend call is invoking a system sleep or a > selective suspend. Hence I propose that a new pm_message_t event code, > PMSG_SELECTIVE (or maybe PMSG_SELECTIVE_SUSPEND), be used for selective > suspends. I +/- agree, though I think there also must be a way to completely suspend the device, like when you are doing a system suspend. > With resume-on-demand implemented properly, a driver may decide that > it can suspend its device without bothering to suspend the device's > children. This kind of decision should be left to individual drivers > and the PM core shouldn't try to enforce a "children must be suspended > before their parents" policy for selective suspends. Also true, and even true for system suspend states. While some child devices may not support PM, a parent device could, and power down the entire bus. It's important that we do descendant-ancestor ordering correctly during system suspend transitions. For runtime transitions, we need a way for the driver of a parent device to return an error if its child devices aren't in a compatible state for it (the parent) to be suspended. This would be doing something like partial-tree suspends, but I'm not sure if this is best done in the kernel or in userspace with a proper tool. > A common problem facing all drivers that do auto suspend is how to set > the inactivity timeout. Two possible answers are: add an attribute > file in the /sys/.../power directory (so different devices can have > different timeouts), or add a driver module parameter (so all devices > using the same driver will have the same timeout). It's trickier than that. You want a per-device parameter that can be adjusted. You also want a per-state parameter so that a device can gradually enter a deeper and deeper state over time. (You can do it with 1 timer per device that is set to the timeout value of the next state when one fires, but that's an implementation issue). So, it's bus-specific because it involves the name and number of physical power states. And, it has a driver-specific component that is adjusted when the driver is bound to the device. Plus, you also need to make sure, in the drivers, that you adjust/modify the proper timer values when you enter a specific device state. This is all screaming for a much more complete bus-specific interface to power management. It seems like the driver core can provide some helpers and some common interfaces, but since most of the work is bus-specific, it should be happening in e.g. PCI and USB.. > For user suspends (made through sysfs) the user may want to convey > arbitrary information to a driver, things like which clocks to turn > off, which power level to change to, and so on. This information > will vary from driver to driver, and the PM core shouldn't even try to > impose any sort of structure on it. I think the best approach will be > to pass to the driver a character pointer giving the data written to > /sys/.../power/state, so that users can send whatever they want just > by writing it to the file. This means adding an additional field to > pm_message_t. Uh, that would really suck. This would entail a string parser in every driver, which is what we wanted to get away from with sysfs. A better way would be to have a driver export a file with the specific features that it supports encoded in a meaningful and efficient way (i.e. a fixed-length string, character, or constant). Pat