On Sunday, 9 of March 2008, Alan Stern wrote: > On Sun, 9 Mar 2008, Rafael J. Wysocki wrote: > > > > Didn't we decide it would be better to call @prepare in a separate > > > pass? > > > > Afterwards I realized that it would be racy. Namely, once a new device is > > registered, it may require ->prepare() to be called before ->suspend(), so if > > that happens after we've called ->prepare() for all devices, we'll be in > > trouble. > > There's nothing wrong with doing it this way. However the race you're > worried about will never happen if drivers are written correctly. Once > prepare has been called for all devices, no new devices should be > registered. That only applies to the devices with parents. Apart from this, the device can be registered concurrently with its parent's ->prepare() and in that case we should call ->complete() for the parent etc. IOW, almost the same complications as in dpm_suspend() would have to be taken into account. [--snip--] > > + * @prepare: Prepare the device for the upcoming transition, but do NOT change > > + * its hardware state. Prevent new children of the device from being > > + * registered and prevent new calls to the probe method from being made > > + * after @prepare() returns. Also, if a concurrent child registration > > + * or a call to the probe method already in progress is detected by > > + * @prepare(), it may return -EAGAIN, so that the PM core can execute it > > + * once again (e.g. after suspending the newly registered child) to recover > > + * from the race condition. This method is executed for all kinds of > > + * suspend transitions and is immediately followed by one of the suspend > > + * callbacks: @suspend(), @freeze(), @poweroff(), or @quiesce(). > > + * [NOTE: The PM core is able to detect a registration of a new child of a > > + * device concurrent with this device's @prepare() callback, in which case > > + * @complete() will be called for the device and the core will try again to > > + * suspend it later. However, the core is NOT able to handle a call to > > + * the probe method concurrent with @prepare().] > > I think you can leave out this NOTE. Direct probe method calls by > drivers are pretty rare; almost all probing is done by the driver core. Done. > > + * @complete: Undo the changes made by @prepare(). This method is executed for > > + * all kinds of resume transitions, immediately following one of the resume > > + * callbacks: @resume(), @thaw(), @restore(), or @recover(). Also executed > > + * if a suspend callback (@suspend(), @freeze(), @poweroff(), @quiesce()) > > + * immediately following a successful @prepare() fails OR if @prepare() > > + * returns -EAGAIN. > > The last part is wrong. Yes, I noticed that too. > If @prepare returns an error (such as -EAGAIN) then @complete shouldn't be called. Hmm, I'd prefer to call it if ->prepare() fails too. The -EAGAIN case is special, because in fact there's no warranty that ->prepare() will be called again (for example, the new child's ->suspend() may fail before that happens), so we need to do the clean up after it. In general, if ->complete() is assumed to be called after a failing ->prepare() too, that simplifies the code a bit. [--snip--] > > > +enum dpm_state { > > + DPM_ON, > > + DPM_SUSPENDING, > > + DPM_OFF, > > +}; > > There should be kerneldoc for dpm_state. In fact there is one, right above the definition of 'enum dpm_state'. It was placed a bit confusingly under the big comment about the legacy stuff. I changed the ordering of things in pm.h to avoid that confusion. [--snip--] > > > @@ -71,22 +71,29 @@ int device_pm_add(struct device *dev) > > dev->bus ? dev->bus->name : "No Bus", > > kobject_name(&dev->kobj)); > > mutex_lock(&dpm_list_mtx); > > - if ((dev->parent && dev->parent->power.sleeping) || all_sleeping) { > > - if (dev->parent->power.sleeping) > > - dev_warn(dev, > > - "parent %s is sleeping, will not add\n", > > + if (all_sleeping) { > > + dev_warn(dev, "devices are sleeping, will not add\n"); > > + goto Failure; > > + } > > + if (dev->parent) { > > + if (dev->parent->power.status == DPM_OFF) { > > + dev_warn(dev, "parent %s is sleeping, will not add\n", > > dev->parent->bus_id); > > - else > > - dev_warn(dev, "devices are sleeping, will not add\n"); > > - WARN_ON(true); > > - error = -EBUSY; > > - } else { > > - error = dpm_sysfs_add(dev); > > - if (!error) > > - list_add_tail(&dev->power.entry, &dpm_active); > > + goto Failure; > > + } else if (dev->parent->power.status == DPM_SUSPENDING) { > > + dev->parent->power.status == DPM_ON; > > + } > > This part puzzled me until I read the rest of the patch. It's a good > way to look specifically for new _child_ registrations as opposed to > any new registrations. I reworked that a bit. > > @@ -418,22 +629,37 @@ static int dpm_suspend(pm_message_t stat > > struct list_head *entry = dpm_active.prev; > > struct device *dev = to_device(entry); > > > > - WARN_ON(dev->parent && dev->parent->power.sleeping); > > + WARN_ON(dev->parent && dev->parent->power.status == DPM_OFF); > > + dev->power.status = DPM_SUSPENDING; > > + mutex_unlock(&dpm_list_mtx); > > + > > + error = prepare_device(dev, state); > > + > > + mutex_lock(&dpm_list_mtx); > > + if (error == -EAGAIN) > > + dev->power.status = DPM_ON; > > + else if(error) > > + break; > > The last four lines should be: > > if (error) { > dev->power.status = DPM_ON; > if (error == -EAGAIN) > continue; > break; > } Well, I reworked this part too, but in a different way. > Insert: /* Was a child added during prepare_device()? */ I added a different comment here. > > + if (dev->power.status == DPM_ON) { > > + mutex_unlock(&dpm_list_mtx); > > + > > + complete_device(dev, resume_event(state)); > > > > - dev->power.sleeping = true; > > + mutex_lock(&dpm_list_mtx); > > + continue; > > + } > > + dev->power.status = DPM_OFF; > > mutex_unlock(&dpm_list_mtx); > > + > > error = suspend_device(dev, state); > > + > > mutex_lock(&dpm_list_mtx); > > if (error) { > > - printk(KERN_ERR "Could not suspend device %s: " > > - "error %d%s\n", > > - kobject_name(&dev->kobj), > > - error, > > - (error == -EAGAIN ? > > - " (please convert to suspend_late)" : > > - "")); > > - dev->power.sleeping = false; > > - break; > > + dev->power.status = DPM_ON; > > + mutex_unlock(&dpm_list_mtx); > > + > > + complete_device(dev, resume_event(state)); > > + return error; > > Early return? I guess it's okay... > > > } > > if (!list_empty(&dev->power.entry)) > > list_move(&dev->power.entry, &dpm_off); > > Looks good. Well, I reworked it somewhat in the meantime. First, I moved the pm_dev_dbg() stuff into the inner if () blocks. Second, I made it possible to use ->prepare() and ->complete() in the "noirq" case as well and added a comment about the "noirq" callbacks. Finally, I cut some rough edges here and there (at least I hope so). The new patch is appended. Thanks, Rafael --- arch/x86/kernel/apm_32.c | 4 drivers/base/power/main.c | 463 +++++++++++++++++++++++++++++++++++----------- include/linux/device.h | 8 include/linux/pm.h | 238 ++++++++++++++++++++--- kernel/power/disk.c | 14 - kernel/power/main.c | 4 6 files changed, 579 insertions(+), 152 deletions(-) Index: linux-2.6/include/linux/pm.h =================================================================== --- linux-2.6.orig/include/linux/pm.h +++ linux-2.6/include/linux/pm.h @@ -110,11 +110,9 @@ extern void (*pm_power_off_prepare)(void struct device; -typedef struct pm_message { - int event; -} pm_message_t; - -/* +/** + * struct pm_ops - device PM callbacks + * * Several driver power state transitions are externally visible, affecting * the state of pending I/O queues and (for drivers that touch hardware) * interrupts, wakeups, DMA, and other hardware state. There may also be @@ -122,6 +120,206 @@ typedef struct pm_message { * to the rest of the driver stack (such as a driver that's ON gating off * clocks which are not in active use). * + * The externally visible transitions are handled with the help of the following + * callbacks included in this structure: + * + * @prepare: Prepare the device for the upcoming transition, but do NOT change + * its hardware state. Prevent new children of the device from being + * registered and prevent new calls to the probe method from being made + * after @prepare() returns. Also, if a concurrent child registration + * or a call to the probe method already in progress is detected by + * @prepare(), it may return -EAGAIN, so that the PM core can execute it + * once again (e.g. after suspending the newly registered child) to recover + * from the race condition. This method is executed for all kinds of + * suspend transitions and is immediately followed by one of the suspend + * callbacks: @suspend(), @freeze(), @poweroff(), or @quiesce(). + * + * @complete: Undo the changes made by @prepare(). This method is executed for + * all kinds of resume transitions, immediately following one of the resume + * callbacks: @resume(), @thaw(), @restore(), or @recover(). Also executed + * if a suspend callback (@suspend(), @freeze(), @poweroff(), @quiesce()) + * immediately following a successful @prepare() fails OR if @prepare() + * itself fails. @complete() cannot fail. + * + * @suspend: Executed before putting the system into a sleep state in which the + * contents of main memory are preserved. Quiesce the device, put it into + * a low power state appropriate for the upcoming system state (such as + * PCI_D3hot), and enable wakeup events as appropriate. + * + * @resume: Executed after waking the system up from a sleep state in which the + * contents of main memory were preserved. Put the device into the + * appropriate state, according to the information saved in memory by the + * preceding @suspend(). The driver starts working again, responding to + * hardware events and software requests. The hardware may have gone + * through a power-off reset, or it may have maintained state from the + * previous suspend() which the driver may rely on while resuming. On most + * platforms, there are no restrictions on availability of resources like + * clocks during @resume(). + * + * @freeze: Hibernation-specific, executed before creating a hibernation image. + * Quiesce operations so that a consistent image can be created, but do NOT + * otherwise put the device into a low power device state and do NOT emit + * system wakeup events. Save in main memory the device settings to be + * used by @restore() during the subsequent resume from hibernation. + * + * @thaw: Hibernation-specific, executed after creating a hibernation image. + * Undo the changes made by the preceding @freeze(), so the device can be + * operated in the same was as immediately before the call to @freeze(). + * + * @poweroff: Hibernation-specific, executed after saving a hibernation image. + * Quiesce the device, put it into a low power state appropriate for the + * upcoming system state (such as PCI_D3hot), and enable wakeup events as + * appropriate. + * + * @quiesce: Hibernation-specific, executed after loading a hibernation image + * and before restoring the contents of main memory from it. Quiesce + * operations so that the contents of main memory can be restored from the + * image in a consistent way, but do NOT otherwise put the device into a + * low power state and do NOT emit system wakeup events. Do NOT save any + * device settings in main memory. + * + * @restore: Hibernation-specific, executed after restoring the contents of main + * memory from a hibernation image. Driver starts working again, + * responding to hardware events and software requests. Do NOT make ANY + * assumptions about the hardware state right prior to @restore(). On most + * platforms, there are no restrictions on availability of resources like + * clocks during @restore(). + * + * @recover: Hibernation-specific, executed after a failing creation of a + * hibernation image OR after a failing attempt to restore the contents of + * main memory from such an image. Undo the changes made by the preceding + * @freeze() or @quiesce(), so the device can be operated in the same way + * as immediately before the failing transition. + * + * struct pm_ops is also used for defining driver PM operations to be carried + * out with interrupts disabled. In this case, however, there is only one + * active CPU while the operations are being performed, so they are executed by + * the PM core in a more straightforward way. In particular, the failure of + * @prepare() carried out with interrupts disabled causes the entire PM + * transition to fail, regardless of the error code returned by it. + */ + +struct pm_ops { +#ifdef CONFIG_PM_SLEEP + int (*prepare)(struct device *dev); + void (*complete)(struct device *dev); +#endif +#ifdef CONFIG_SUSPEND + int (*suspend)(struct device *dev); + int (*resume)(struct device *dev); +#endif +#ifdef CONFIG_HIBERNATION + int (*freeze)(struct device *dev); + int (*thaw)(struct device *dev); + int (*poweroff)(struct device *dev); + int (*quiesce)(struct device *dev); + int (*restore)(struct device *dev); + int (*recover)(struct device *dev); +#endif +}; + +/** + * PM_EVENT_ messages + * + * The following PM_EVENT_ messages are defined for the internal use of the PM + * core, in order to provide a mechanism allowing the high level suspend and + * hibernation code to convey the necessary information to the device PM core + * code: + * + * ON No transition. + * + * FREEZE System is going to hibernate, call ->prepare() and ->freeze() + * for all devices. + * + * SUSPEND System is going to suspend, call ->prepare() and ->suspend() + * for all devices. + * + * HIBERNATE Hibernation image has been saved, call ->prepare() and + * ->poweroff() for all devices. + * + * QUIESCE Contents of main memory are going to be restored from a (loaded) + * hibernation image, call ->prepare() and ->quiesce() for all + * devices. + * + * RESUME System is resuming, call ->resume() and ->complete() for all + * devices. + * + * THAW Hibernation image has been created, call ->thaw() and + * ->complete() for all devices. + * + * RESTORE Contents of main memory have been restored from a hibernation + * image, call ->restore() and ->complete() for all devices. + * + * RECOVER Creation of a hibernation image or restoration of the main + * memory contents from a hibernation image has failed, call + * ->recover() and ->complete() for all devices. + */ + +typedef struct pm_message { + int event; +} pm_message_t; + +#define PM_EVENT_ON 0x0000 +#define PM_EVENT_FREEZE 0x0001 +#define PM_EVENT_SUSPEND 0x0002 +#define PM_EVENT_HIBERNATE 0x0004 +#define PM_EVENT_QUIESCE 0x0008 +#define PM_EVENT_RESUME 0x0010 +#define PM_EVENT_THAW 0x0020 +#define PM_EVENT_RESTORE 0x0040 +#define PM_EVENT_RECOVER 0x0080 + +#define PM_EVENT_SLEEP (PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE) + +#define PMSG_FREEZE ((struct pm_message){ .event = PM_EVENT_FREEZE, }) +#define PMSG_QUIESCE ((struct pm_message){ .event = PM_EVENT_QUIESCE, }) +#define PMSG_SUSPEND ((struct pm_message){ .event = PM_EVENT_SUSPEND, }) +#define PMSG_HIBERNATE ((struct pm_message){ .event = PM_EVENT_HIBERNATE, }) +#define PMSG_RESUME ((struct pm_message){ .event = PM_EVENT_RESUME, }) +#define PMSG_THAW ((struct pm_message){ .event = PM_EVENT_THAW, }) +#define PMSG_RESTORE ((struct pm_message){ .event = PM_EVENT_RESTORE, }) +#define PMSG_RECOVER ((struct pm_message){ .event = PM_EVENT_RECOVER, }) +#define PMSG_ON ((struct pm_message){ .event = PM_EVENT_ON, }) + +/** + * Device power management states + * + * These state labels are used internally by the PM core to indicate the current + * status of a device with respect to the PM core operations. + * + * DPM_ON Device is regarded as operational. + * + * DPM_SUSPENDING Device is currently being suspended. + * + * DPM_OFF Device is regarded as suspended. + */ + +enum dpm_state { + DPM_ON, + DPM_SUSPENDING, + DPM_OFF, +}; + +struct dev_pm_info { + pm_message_t power_state; + unsigned can_wakeup:1; + enum dpm_state status:2; /* Owned by the PM core */ +#ifdef CONFIG_PM_SLEEP + unsigned should_wakeup:1; + struct list_head entry; +#endif +}; + +/* + * The PM_EVENT_ messages are also used by drivers implementing the legacy + * suspend framework, based on the ->suspend() and ->resume() callbacks common + * for suspend and hibernation transitions, according to the rules below. + */ + +/* Necessary, because several drivers use PM_EVENT_PRETHAW */ +#define PM_EVENT_PRETHAW PM_EVENT_QUIESCE + +/* * One transition is triggered by resume(), after a suspend() call; the * message is implicit: * @@ -166,35 +364,11 @@ typedef struct pm_message { * or from system low-power states such as standby or suspend-to-RAM. */ -#define PM_EVENT_ON 0 -#define PM_EVENT_FREEZE 1 -#define PM_EVENT_SUSPEND 2 -#define PM_EVENT_HIBERNATE 4 -#define PM_EVENT_PRETHAW 8 - -#define PM_EVENT_SLEEP (PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE) - -#define PMSG_FREEZE ((struct pm_message){ .event = PM_EVENT_FREEZE, }) -#define PMSG_PRETHAW ((struct pm_message){ .event = PM_EVENT_PRETHAW, }) -#define PMSG_SUSPEND ((struct pm_message){ .event = PM_EVENT_SUSPEND, }) -#define PMSG_HIBERNATE ((struct pm_message){ .event = PM_EVENT_HIBERNATE, }) -#define PMSG_ON ((struct pm_message){ .event = PM_EVENT_ON, }) - -struct dev_pm_info { - pm_message_t power_state; - unsigned can_wakeup:1; - bool sleeping:1; /* Owned by the PM core */ -#ifdef CONFIG_PM_SLEEP - unsigned should_wakeup:1; - struct list_head entry; -#endif -}; +#ifdef CONFIG_PM_SLEEP +extern void device_power_up(pm_message_t state); +extern void device_resume(pm_message_t state); extern int device_power_down(pm_message_t state); -extern void device_power_up(void); -extern void device_resume(void); - -#ifdef CONFIG_PM_SLEEP extern int device_suspend(pm_message_t state); extern int device_prepare_suspend(pm_message_t state); Index: linux-2.6/drivers/base/power/main.c =================================================================== --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -71,22 +71,30 @@ int device_pm_add(struct device *dev) dev->bus ? dev->bus->name : "No Bus", kobject_name(&dev->kobj)); mutex_lock(&dpm_list_mtx); - if ((dev->parent && dev->parent->power.sleeping) || all_sleeping) { - if (dev->parent->power.sleeping) - dev_warn(dev, - "parent %s is sleeping, will not add\n", + if (all_sleeping) { + dev_warn(dev, "all devices are sleeping, will not add\n"); + goto Refuse; + } + if (dev->parent) + switch (dev->parent->power.status) { + case DPM_OFF: + dev_warn(dev, "parent %s is sleeping, will not add\n", dev->parent->bus_id); - else - dev_warn(dev, "devices are sleeping, will not add\n"); - WARN_ON(true); - error = -EBUSY; - } else { - error = dpm_sysfs_add(dev); - if (!error) - list_add_tail(&dev->power.entry, &dpm_active); - } + goto Refuse; + case DPM_SUSPENDING: + dev->parent->power.status = DPM_ON; + break; + } + error = dpm_sysfs_add(dev); + if (!error) + list_add_tail(&dev->power.entry, &dpm_active); + End: mutex_unlock(&dpm_list_mtx); return error; + Refuse: + WARN_ON(true); + error = -EBUSY; + goto End; } /** @@ -124,26 +132,135 @@ void device_pm_schedule_removal(struct d } EXPORT_SYMBOL_GPL(device_pm_schedule_removal); +/** + * pm_op - execute the PM operation appropiate for given PM event + * @dev: Device. + * @ops: PM operations to choose from. + * @state: PM event message. + */ +static int pm_op(struct device *dev, struct pm_ops *ops, pm_message_t state) +{ + int error = 0; + + switch (state.event) { +#ifdef CONFIG_SUSPEND + case PM_EVENT_SUSPEND: + if (ops->suspend) { + error = ops->suspend(dev); + suspend_report_result(ops->suspend, error); + } + break; + case PM_EVENT_RESUME: + if (ops->resume) { + error = ops->resume(dev); + suspend_report_result(ops->resume, error); + } + break; +#endif /* CONFIG_SUSPEND */ +#ifdef CONFIG_HIBERNATION + case PM_EVENT_FREEZE: + if (ops->freeze) { + error = ops->freeze(dev); + suspend_report_result(ops->freeze, error); + } + break; + case PM_EVENT_QUIESCE: + if (ops->quiesce) { + error = ops->quiesce(dev); + suspend_report_result(ops->quiesce, error); + } + break; + case PM_EVENT_HIBERNATE: + if (ops->poweroff) { + error = ops->poweroff(dev); + suspend_report_result(ops->poweroff, error); + } + break; + case PM_EVENT_THAW: + if (ops->thaw) { + error = ops->thaw(dev); + suspend_report_result(ops->thaw, error); + } + break; + case PM_EVENT_RESTORE: + if (ops->restore) { + error = ops->restore(dev); + suspend_report_result(ops->restore, error); + } + break; + case PM_EVENT_RECOVER: + if (ops->recover) { + error = ops->recover(dev); + suspend_report_result(ops->recover, error); + } + break; +#endif /* CONFIG_HIBERNATION */ + default: + error = -EINVAL; + } + return error; +} + +static char *pm_verb(int event) +{ + switch (event) { + case PM_EVENT_SUSPEND: + return "suspend"; + case PM_EVENT_RESUME: + return "resume"; + case PM_EVENT_FREEZE: + return "freeze"; + case PM_EVENT_QUIESCE: + return "quiesce"; + case PM_EVENT_HIBERNATE: + return "hibernate"; + case PM_EVENT_THAW: + return "thaw"; + case PM_EVENT_RESTORE: + return "restore"; + default: + return "(unknown PM event)"; + } +} + +static void pm_dev_dbg(struct device *dev, pm_message_t state, char *info) +{ + dev_dbg(dev, "%s%s%s\n", info, pm_verb(state.event), + ((state.event & PM_EVENT_SLEEP) && device_may_wakeup(dev)) ? + ", may wakeup" : ""); +} + /*------------------------- Resume routines -------------------------*/ /** - * resume_device_early - Power on one device (early resume). + * resume_device_noirq - Power on one device (early resume). * @dev: Device. + * @state: Operation to carry out. * * Must be called with interrupts disabled. */ -static int resume_device_early(struct device *dev) +static int resume_device_noirq(struct device *dev, pm_message_t state) { int error = 0; TRACE_DEVICE(dev); TRACE_RESUME(0); - if (dev->bus && dev->bus->resume_early) { - dev_dbg(dev, "EARLY resume\n"); + if (!dev->bus) + goto End; + + if (dev->bus->pm_noirq) { + struct pm_ops *ops = dev->bus->pm_noirq; + + pm_dev_dbg(dev, state, "EARLY "); + error = pm_op(dev, ops, state); + if (ops->complete) + ops->complete(dev); + } else if (dev->bus->resume_early) { + pm_dev_dbg(dev, state, "legacy EARLY "); error = dev->bus->resume_early(dev); } - + End: TRACE_RESUME(error); return error; } @@ -158,7 +275,7 @@ static int resume_device_early(struct de * * Must be called with interrupts disabled and only one CPU running. */ -static void dpm_power_up(void) +static void dpm_power_up(pm_message_t state) { while (!list_empty(&dpm_off_irq)) { @@ -166,7 +283,7 @@ static void dpm_power_up(void) struct device *dev = to_device(entry); list_move_tail(entry, &dpm_off); - resume_device_early(dev); + resume_device_noirq(dev, state); } } @@ -178,19 +295,46 @@ static void dpm_power_up(void) * * Must be called with interrupts disabled. */ -void device_power_up(void) +void device_power_up(pm_message_t state) { sysdev_resume(); - dpm_power_up(); + dpm_power_up(state); } EXPORT_SYMBOL_GPL(device_power_up); /** + * complete_device - Complete a PM transition for given device + * @dev: Device. + * @state: Power transition we are completing. + */ +static void complete_device(struct device *dev, pm_message_t state) +{ + down(&dev->sem); + + if (dev->bus && dev->bus->pm && dev->bus->pm->complete) { + pm_dev_dbg(dev, state, "completing "); + dev->bus->pm->complete(dev); + } + + if (dev->type && dev->type->pm && dev->type->pm->complete) { + pm_dev_dbg(dev, state, "completing type "); + dev->type->pm->complete(dev); + } + + if (dev->class && dev->class->pm && dev->class->pm->complete) { + pm_dev_dbg(dev, state, "completing class "); + dev->class->pm->complete(dev); + } + + up(&dev->sem); +} + +/** * resume_device - Restore state for one device. * @dev: Device. - * + * @state: Operation to carry out. */ -static int resume_device(struct device *dev) +static int resume_device(struct device *dev, pm_message_t state) { int error = 0; @@ -199,21 +343,40 @@ static int resume_device(struct device * down(&dev->sem); - if (dev->bus && dev->bus->resume) { - dev_dbg(dev,"resuming\n"); - error = dev->bus->resume(dev); + if (dev->bus) { + if (dev->bus->pm) { + pm_dev_dbg(dev, state, ""); + error = pm_op(dev, dev->bus->pm, state); + } else if (dev->bus->resume) { + pm_dev_dbg(dev, state, "legacy "); + error = dev->bus->resume(dev); + } + if (error) + goto End; } - if (!error && dev->type && dev->type->resume) { - dev_dbg(dev,"resuming\n"); - error = dev->type->resume(dev); + if (dev->type) { + if (dev->type->pm) { + pm_dev_dbg(dev, state, "type "); + error = pm_op(dev, dev->type->pm, state); + } else if (dev->type->resume) { + pm_dev_dbg(dev, state, "legacy type "); + error = dev->type->resume(dev); + } + if (error) + goto End; } - if (!error && dev->class && dev->class->resume) { - dev_dbg(dev,"class resume\n"); - error = dev->class->resume(dev); + if (dev->class) { + if (dev->class->pm) { + pm_dev_dbg(dev, state, "class "); + error = pm_op(dev, dev->class->pm, state); + } else if (dev->class->resume) { + pm_dev_dbg(dev, state, "legacy class "); + error = dev->class->resume(dev); + } } - + End: up(&dev->sem); TRACE_RESUME(error); @@ -228,9 +391,9 @@ static int resume_device(struct device * * went through the early resume. * * Take devices from the dpm_off_list, resume them, - * and put them on the dpm_locked list. + * and put them on the dpm_active list. */ -static void dpm_resume(void) +static void dpm_resume(pm_message_t state) { mutex_lock(&dpm_list_mtx); all_sleeping = false; @@ -239,9 +402,10 @@ static void dpm_resume(void) struct device *dev = to_device(entry); list_move_tail(entry, &dpm_active); - dev->power.sleeping = false; + dev->power.status = DPM_ON; mutex_unlock(&dpm_list_mtx); - resume_device(dev); + resume_device(dev, state); + complete_device(dev, state); mutex_lock(&dpm_list_mtx); } mutex_unlock(&dpm_list_mtx); @@ -269,14 +433,15 @@ static void unregister_dropped_devices(v /** * device_resume - Restore state of each device in system. + * @state: Operation to carry out. * * Resume all the devices, unlock them all, and allow new * devices to be registered once again. */ -void device_resume(void) +void device_resume(pm_message_t state) { might_sleep(); - dpm_resume(); + dpm_resume(state); unregister_dropped_devices(); } EXPORT_SYMBOL_GPL(device_resume); @@ -284,37 +449,51 @@ EXPORT_SYMBOL_GPL(device_resume); /*------------------------- Suspend routines -------------------------*/ -static inline char *suspend_verb(u32 event) +/** + * resume_event - return a PM message representing the resume event + * corresponding to given sleep state. + * @sleep_state - PM message representing a sleep state + */ +static pm_message_t resume_event(pm_message_t sleep_state) { - switch (event) { - case PM_EVENT_SUSPEND: return "suspend"; - case PM_EVENT_FREEZE: return "freeze"; - case PM_EVENT_PRETHAW: return "prethaw"; - default: return "(unknown suspend event)"; + switch (sleep_state.event) { + case PM_EVENT_SUSPEND: + return PMSG_RESUME; + case PM_EVENT_FREEZE: + case PM_EVENT_QUIESCE: + return PMSG_RECOVER; + case PM_EVENT_HIBERNATE: + return PMSG_RESTORE; } -} - -static void -suspend_device_dbg(struct device *dev, pm_message_t state, char *info) -{ - dev_dbg(dev, "%s%s%s\n", info, suspend_verb(state.event), - ((state.event == PM_EVENT_SUSPEND) && device_may_wakeup(dev)) ? - ", may wakeup" : ""); + return PMSG_ON; } /** - * suspend_device_late - Shut down one device (late suspend). + * suspend_device_noirq - Shut down one device (late suspend). * @dev: Device. - * @state: Power state device is entering. + * @state: PM message representing the operation to perform. * * This is called with interrupts off and only a single CPU running. */ -static int suspend_device_late(struct device *dev, pm_message_t state) +static int suspend_device_noirq(struct device *dev, pm_message_t state) { int error = 0; - if (dev->bus && dev->bus->suspend_late) { - suspend_device_dbg(dev, state, "LATE "); + if (!dev->bus) + return 0; + + if (dev->bus->pm_noirq) { + struct pm_ops *ops = dev->bus->pm_noirq; + + pm_dev_dbg(dev, state, "LATE "); + if (ops->prepare) + error = ops->prepare(dev); + if (!error) + error = pm_op(dev, ops, state); + if (error && ops->complete) + ops->complete(dev); + } else if (dev->bus->suspend_late) { + pm_dev_dbg(dev, state, "legacy LATE "); error = dev->bus->suspend_late(dev, state); suspend_report_result(dev->bus->suspend_late, error); } @@ -323,7 +502,7 @@ static int suspend_device_late(struct de /** * device_power_down - Shut down special devices. - * @state: Power state to enter. + * @state: PM message representing the operation to perform. * * Power down devices that require interrupts to be disabled * and move them from the dpm_off list to the dpm_off_irq list. @@ -339,7 +518,7 @@ int device_power_down(pm_message_t state struct list_head *entry = dpm_off.prev; struct device *dev = to_device(entry); - error = suspend_device_late(dev, state); + error = suspend_device_noirq(dev, state); if (error) { printk(KERN_ERR "Could not power down device %s: " "error %d\n", @@ -353,12 +532,50 @@ int device_power_down(pm_message_t state if (!error) error = sysdev_suspend(state); if (error) - dpm_power_up(); + dpm_power_up(resume_event(state)); return error; } EXPORT_SYMBOL_GPL(device_power_down); /** + * prepare_device - Execute the ->prepare() callback(s) for given device. + * @dev: Device. + * @state: PM operation we are preparing for. + */ +static int prepare_device(struct device *dev, pm_message_t state) +{ + int error = 0; + + down(&dev->sem); + + if (dev->bus && dev->bus->pm && dev->bus->pm->prepare) { + pm_dev_dbg(dev, state, "preparing "); + error = dev->bus->pm->prepare(dev); + suspend_report_result(dev->bus->pm->prepare, error); + if (error) + goto End; + } + + if (dev->type && dev->type->pm && dev->type->pm->prepare) { + pm_dev_dbg(dev, state, "preparing type "); + error = dev->type->pm->prepare(dev); + suspend_report_result(dev->type->pm->prepare, error); + if (error) + goto End; + } + + if (dev->class && dev->class->pm && dev->class->pm->prepare) { + pm_dev_dbg(dev, state, "preparing class "); + error = dev->class->pm->prepare(dev); + suspend_report_result(dev->class->pm->prepare, error); + } + End: + up(&dev->sem); + + return error; +} + +/** * suspend_device - Save state of one device. * @dev: Device. * @state: Power state device is entering. @@ -369,29 +586,43 @@ static int suspend_device(struct device down(&dev->sem); - if (dev->power.power_state.event) { - dev_dbg(dev, "PM: suspend %d-->%d\n", - dev->power.power_state.event, state.event); - } - - if (dev->class && dev->class->suspend) { - suspend_device_dbg(dev, state, "class "); - error = dev->class->suspend(dev, state); - suspend_report_result(dev->class->suspend, error); + if (dev->class) { + if (dev->class->pm) { + pm_dev_dbg(dev, state, "class "); + error = pm_op(dev, dev->class->pm, state); + } else if (dev->class->suspend) { + pm_dev_dbg(dev, state, "legacy class "); + error = dev->class->suspend(dev, state); + suspend_report_result(dev->class->suspend, error); + } + if (error) + goto End; } - if (!error && dev->type && dev->type->suspend) { - suspend_device_dbg(dev, state, "type "); - error = dev->type->suspend(dev, state); - suspend_report_result(dev->type->suspend, error); + if (dev->type) { + if (dev->type->pm) { + pm_dev_dbg(dev, state, "type "); + error = pm_op(dev, dev->type->pm, state); + } else if (dev->type->suspend) { + pm_dev_dbg(dev, state, "legacy type "); + error = dev->type->suspend(dev, state); + suspend_report_result(dev->type->suspend, error); + } + if (error) + goto End; } - if (!error && dev->bus && dev->bus->suspend) { - suspend_device_dbg(dev, state, ""); - error = dev->bus->suspend(dev, state); - suspend_report_result(dev->bus->suspend, error); + if (dev->bus) { + if (dev->bus->pm) { + pm_dev_dbg(dev, state, ""); + error = pm_op(dev, dev->bus->pm, state); + } else if (dev->bus->suspend) { + pm_dev_dbg(dev, state, "legacy "); + error = dev->bus->suspend(dev, state); + suspend_report_result(dev->bus->suspend, error); + } } - + End: up(&dev->sem); return error; @@ -401,56 +632,70 @@ static int suspend_device(struct device * dpm_suspend - Suspend every device. * @state: Power state to put each device in. * - * Walk the dpm_locked list. Suspend each device and move it + * Walk the dpm_active list. Suspend each device and move it * to the dpm_off list. - * - * (For historical reasons, if it returns -EAGAIN, that used to mean - * that the device would be called again with interrupts disabled. - * These days, we use the "suspend_late()" callback for that, so we - * print a warning and consider it an error). */ static int dpm_suspend(pm_message_t state) { - int error = 0; - mutex_lock(&dpm_list_mtx); while (!list_empty(&dpm_active)) { struct list_head *entry = dpm_active.prev; struct device *dev = to_device(entry); + int error; + + WARN_ON(dev->parent && dev->parent->power.status == DPM_OFF); + dev->power.status = DPM_SUSPENDING; + mutex_unlock(&dpm_list_mtx); + + error = prepare_device(dev, state); - WARN_ON(dev->parent && dev->parent->power.sleeping); + mutex_lock(&dpm_list_mtx); + if (error == -EAGAIN) + dev->power.status = DPM_ON; + else if(error) + goto Error; + if (dev->power.status == DPM_ON) { + /* + * prepare_device() returned -EAGAIN or a child was + * added during it + */ + mutex_unlock(&dpm_list_mtx); + + complete_device(dev, resume_event(state)); - dev->power.sleeping = true; + mutex_lock(&dpm_list_mtx); + continue; + } + dev->power.status = DPM_OFF; mutex_unlock(&dpm_list_mtx); + error = suspend_device(dev, state); + mutex_lock(&dpm_list_mtx); - if (error) { - printk(KERN_ERR "Could not suspend device %s: " - "error %d%s\n", - kobject_name(&dev->kobj), - error, - (error == -EAGAIN ? - " (please convert to suspend_late)" : - "")); - dev->power.sleeping = false; - break; + if (!error) { + if (!list_empty(&dev->power.entry)) + list_move(&dev->power.entry, &dpm_off); + continue; } - if (!list_empty(&dev->power.entry)) - list_move(&dev->power.entry, &dpm_off); + Error: + printk(KERN_ERR "Could not suspend device %s: error %d\n", + kobject_name(&dev->kobj), error); + dev->power.status = DPM_ON; + mutex_unlock(&dpm_list_mtx); + + complete_device(dev, resume_event(state)); + return error; } - if (!error) - all_sleeping = true; + all_sleeping = true; mutex_unlock(&dpm_list_mtx); - - return error; + return 0; } /** * device_suspend - Save state and stop all devices in system. * @state: new power management state * - * Prevent new devices from being registered, then lock all devices - * and suspend them. + * Prepare and suspend all devices. */ int device_suspend(pm_message_t state) { @@ -459,7 +704,7 @@ int device_suspend(pm_message_t state) might_sleep(); error = dpm_suspend(state); if (error) - device_resume(); + device_resume(resume_event(state)); return error; } EXPORT_SYMBOL_GPL(device_suspend); Index: linux-2.6/include/linux/device.h =================================================================== --- linux-2.6.orig/include/linux/device.h +++ linux-2.6/include/linux/device.h @@ -69,6 +69,9 @@ struct bus_type { int (*resume_early)(struct device *dev); int (*resume)(struct device *dev); + struct pm_ops *pm; + struct pm_ops *pm_noirq; + struct bus_type_private *p; }; @@ -202,6 +205,8 @@ struct class { int (*suspend)(struct device *dev, pm_message_t state); int (*resume)(struct device *dev); + + struct pm_ops *pm; }; extern int __must_check class_register(struct class *class); @@ -345,8 +350,11 @@ struct device_type { struct attribute_group **groups; int (*uevent)(struct device *dev, struct kobj_uevent_env *env); void (*release)(struct device *dev); + int (*suspend)(struct device *dev, pm_message_t state); int (*resume)(struct device *dev); + + struct pm_ops *pm; }; /* interface for exporting device attributes */ Index: linux-2.6/kernel/power/disk.c =================================================================== --- linux-2.6.orig/kernel/power/disk.c +++ linux-2.6/kernel/power/disk.c @@ -224,7 +224,7 @@ static int create_image(int platform_mod /* NOTE: device_power_up() is just a resume() for devices * that suspended with irqs off ... no overall powerup. */ - device_power_up(); + device_power_up(in_suspend ? PMSG_RECOVER : PMSG_RESTORE); Enable_irqs: local_irq_enable(); return error; @@ -280,7 +280,7 @@ int hibernation_snapshot(int platform_mo Finish: platform_finish(platform_mode); Resume_devices: - device_resume(); + device_resume(in_suspend ? PMSG_RECOVER : PMSG_RESTORE); Resume_console: resume_console(); Close: @@ -301,7 +301,7 @@ static int resume_target_kernel(void) int error; local_irq_disable(); - error = device_power_down(PMSG_PRETHAW); + error = device_power_down(PMSG_QUIESCE); if (error) { printk(KERN_ERR "PM: Some devices failed to power down, " "aborting resume\n"); @@ -329,7 +329,7 @@ static int resume_target_kernel(void) swsusp_free(); restore_processor_state(); touch_softlockup_watchdog(); - device_power_up(); + device_power_up(PMSG_THAW); Enable_irqs: local_irq_enable(); return error; @@ -350,7 +350,7 @@ int hibernation_restore(int platform_mod pm_prepare_console(); suspend_console(); - error = device_suspend(PMSG_PRETHAW); + error = device_suspend(PMSG_QUIESCE); if (error) goto Finish; @@ -362,7 +362,7 @@ int hibernation_restore(int platform_mod enable_nonboot_cpus(); } platform_restore_cleanup(platform_mode); - device_resume(); + device_resume(PMSG_RECOVER); Finish: resume_console(); pm_restore_console(); @@ -419,7 +419,7 @@ int hibernation_platform_enter(void) Finish: hibernation_ops->finish(); Resume_devices: - device_resume(); + device_resume(PMSG_RESTORE); Resume_console: resume_console(); Close: Index: linux-2.6/kernel/power/main.c =================================================================== --- linux-2.6.orig/kernel/power/main.c +++ linux-2.6/kernel/power/main.c @@ -239,7 +239,7 @@ static int suspend_enter(suspend_state_t if (!suspend_test(TEST_CORE)) error = suspend_ops->enter(state); - device_power_up(); + device_power_up(PMSG_RESUME); Done: arch_suspend_enable_irqs(); BUG_ON(irqs_disabled()); @@ -291,7 +291,7 @@ int suspend_devices_and_enter(suspend_st if (suspend_ops->finish) suspend_ops->finish(); Resume_devices: - device_resume(); + device_resume(PMSG_RESUME); Resume_console: resume_console(); Close: Index: linux-2.6/arch/x86/kernel/apm_32.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/apm_32.c +++ linux-2.6/arch/x86/kernel/apm_32.c @@ -1221,9 +1221,9 @@ static int suspend(int vetoable) if (err != APM_SUCCESS) apm_error("suspend", err); err = (err == APM_SUCCESS) ? 0 : -EIO; - device_power_up(); + device_power_up(PMSG_RESUME); local_irq_enable(); - device_resume(); + device_resume(PMSG_RESUME); pm_send_all(PM_RESUME, (void *)0); queue_event(APM_NORMAL_RESUME, NULL); out: -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html