[linux-pm] [PATCH 2/2] Fix console handling during suspend/resume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Fri, 23 Jun 2006, Linus Torvalds wrote:
> 
> Let me reboot my current kernel to test my current five-phase thing, and 
> I'll do the subsystem thing too.

Ok, here.

This simple patch is nothing but cleanups, cleanups, cleanups.

And in the process, _I_ think it helps the suspend infrastructure a lot.

I don't know how many people have ever actually _looked_ closely at how 
horrible the ->suspend() sequence was, but let's just say that it was hard 
to make sense of how dpm_active->dpm_off worked, and what dpm_off_irq 
actually did. More importantly, it was basically impossible for devices to 
sanely use the whole dpm_off_irq logic (I doubt anybody ever did - you 
would return -EAGAIN to move you into the dpm_off_irq queue, but the 
recovery was pretty damn undefined - you'd then get "resumed" even 
though you never successfully suspended etc).

Btw, if anybody had ever actually used the "dpm_off_irq" thing, they
should have seen a huge warning about the semaphore sleeping with
interrupts off, so I'm pretty sure nobody ever really used it.  Since I
think it was unusable, I'm not surprised. 

The sane version has a very simple sequence:

 - devices start on "dpm_active". 

 - "suspend_prepare()" is called for every device (with the semaphore 
   held, you are _not_ allowed to try to unlink yourself in the prepare 
   function)

 - then, we iterate over every device, and move it from "dpm_active" to 
   "dpm_off" when calling "suspend()". The suspend function is now the 
   subsystem suspend, followed by the device bus suspend.

   (Of course, no subsystem actually _implements_ a suspend yet, but this 
   is where a network class could shut off the generic network stack 
   stuff, ie NAPI polling etc)

 - we now disable interrupts

 - then, we iterate over every device on "dpm_off", and move it to 
   "dpm_off_irq", while calling "suspend_late()"

 - we now actually suspend (system devices go here too).

 - then, we resume in the reverse order: iterate over "dpm_off_irq", 
   moving the devices to "dpm_off", while calling "resume_early".

 - enable interrupts

 - then, we iterate over "dpm_off", moving devices to "dpm_active" while 
   calling the "resume" function(s) - first the bus resume, then the class 
   resume.

And that's it.

The nice part here is the error management (which, quite frankly, was
insane with the old "dpm_off_irq" scheme).  In the new scheme, the lists
always mean the same thing, so if you have errors half-way, you know
_exactly_ what you've called, and you will undo _exactly_ the right
thing (ie if you had an error half-way through the "suspend_late" phase,
you will only call "resume_early" on those devices that went through the
suspend_late). 

And more importantly, the nice thing is that devices now have access to 
the early/late suspend functionality.

Now, I only did the PCI infrastructure for that - other buses will simply 
not pass on the early/late events, because they don't support them. In 
practice, most other buses probably don't even want to (ie the whole 
notion doesn't make any sense for a SCSI device or for a USB device - 
there's nothing you can do with interrupts off to the device _anyway_).

The patch is literally just 376 lines long. You can read it, and it all 
makes sense. This doesn't actually do any of the _devices_, of course, 
because to get there, I have to not only suspend the network device late, 
I obviously have to suspend the PCI _bus_ device late too (otherwise I'd 
suspend the network device after I suspended the bus it was on ;)

Simple enough to do, but I needed the infrastructure first.

Quite frankly, anybody who looks at this patch and doesn't say "that makes 
sense" has his head so far up his ass that it's not even funny.

(And no, it's not been very extensively tested. My Mac Mini still suspends 
and resumes, but that's not a big surprise, since it doesn't actually 
_use_ the new facilities provided by the infrastructure changes yet. That 
is for later..)

		Linus

---
diff --git a/drivers/base/power/resume.c b/drivers/base/power/resume.c
index 317edbf..bafd7d2 100644
--- a/drivers/base/power/resume.c
+++ b/drivers/base/power/resume.c
@@ -35,12 +35,31 @@ int resume_device(struct device * dev)
 		dev_dbg(dev,"resuming\n");
 		error = dev->bus->resume(dev);
 	}
+	if (dev->class && dev->class->resume) {
+		dev_dbg(dev,"class resume\n");
+		error = dev->class->resume(dev);
+	}
 	up(&dev->sem);
 	return error;
 }
 
 
+static int resume_device_early(struct device * dev)
+{
+	int error = 0;
 
+	if (dev->bus && dev->bus->resume_early) {
+		dev_dbg(dev,"EARLY resume\n");
+		error = dev->bus->resume(dev);
+	}
+	return error;
+}
+
+/*
+ * Resume the devices that have either not gone through
+ * the late suspend, or that did go through it but also
+ * went through the early resume
+ */
 void dpm_resume(void)
 {
 	down(&dpm_list_sem);
@@ -96,11 +115,9 @@ void dpm_power_up(void)
 		struct list_head * entry = dpm_off_irq.next;
 		struct device * dev = to_device(entry);
 
-		get_device(dev);
 		list_del_init(entry);
-		list_add_tail(entry, &dpm_active);
-		resume_device(dev);
-		put_device(dev);
+		list_add_tail(entry, &dpm_off);
+		resume_device_early(dev);
 	}
 }
 
diff --git a/drivers/base/power/suspend.c b/drivers/base/power/suspend.c
index 1a1fe43..2e6be8a 100644
--- a/drivers/base/power/suspend.c
+++ b/drivers/base/power/suspend.c
@@ -65,7 +65,19 @@ int suspend_device(struct device * dev, 
 
 	dev->power.prev_state = dev->power.power_state;
 
-	if (dev->bus && dev->bus->suspend && !dev->power.power_state.event) {
+	if (dev->class && dev->class->suspend && !dev->power.power_state.event) {
+		dev_dbg(dev, "class %s%s\n",
+			suspend_verb(state.event),
+			((state.event == PM_EVENT_SUSPEND)
+					&& device_may_wakeup(dev))
+				? ", may wakeup"
+				: ""
+			);
+		error = dev->class->suspend(dev, state);
+		suspend_report_result(dev->class->suspend, error);
+	}
+
+	if (!error && dev->bus && dev->bus->suspend && !dev->power.power_state.event) {
 		dev_dbg(dev, "%s%s\n",
 			suspend_verb(state.event),
 			((state.event == PM_EVENT_SUSPEND)
@@ -81,15 +93,74 @@ int suspend_device(struct device * dev, 
 }
 
 
+/*
+ * This is called with interrupts off, only a single CPU
+ * running. We can't do down() on a semaphore (and we don't
+ * need the protection)
+ */
+static int suspend_device_late(struct device *dev, pm_message_t state)
+{
+	int error = 0;
+
+	if (dev->power.power_state.event) {
+		dev_dbg(dev, "PM: suspend_late %d-->%d\n",
+			dev->power.power_state.event, state.event);
+	}
+
+	if (dev->bus && dev->bus->suspend_late && !dev->power.power_state.event) {
+		dev_dbg(dev, "LATE %s%s\n",
+			suspend_verb(state.event),
+			((state.event == PM_EVENT_SUSPEND)
+					&& device_may_wakeup(dev))
+				? ", may wakeup"
+				: ""
+			);
+		error = dev->bus->suspend_late(dev, state);
+		suspend_report_result(dev->bus->suspend_late, error);
+	}
+	return error;
+}
+
+/**
+ *	device_prepare_suspend - save state and prepare to suspend
+ *
+ *	NOTE! Devices cannot detach at this point - not only do we
+ *	hold the device list semaphores over the whole prepare, but
+ *	the whole point is to do non-invasive preparatory work, not
+ *	the actual suspend.
+ */
+int device_prepare_suspend(pm_message_t state)
+{
+	int error = 0;
+	struct device * dev;
+
+	down(&dpm_sem);
+	down(&dpm_list_sem);
+	list_for_each_entry_reverse(dev, &dpm_active, power.entry) {
+		if (!dev->bus || !dev->bus->suspend_prepare)
+			continue;
+		error = dev->bus->suspend_prepare(dev, state);
+		if (error)
+			break;
+	}
+	up(&dpm_list_sem);
+	up(&dpm_sem);
+	return error;
+}
+
 /**
  *	device_suspend - Save state and stop all devices in system.
  *	@state:		Power state to put each device in.
  *
  *	Walk the dpm_active list, call ->suspend() for each device, and move
- *	it to dpm_off.
- *	Check the return value for each. If it returns 0, then we move the
- *	the device to the dpm_off list. If it returns -EAGAIN, we move it to
- *	the dpm_off_irq list. If we get a different error, try and back out.
+ *	it to the dpm_off list.
+ *
+ *	(For historical reasons, if it returns -EAGAIN, that used to mean
+ *	that the device would be called again with interrupts enabled.
+ *	These days, we use the "suspend_late()" callback for that, so we
+ *	print a warning and consider it an error).
+ *
+ *	If we get a different error, try and back out.
  *
  *	If we hit a failure with any of the devices, call device_resume()
  *	above to bring the suspended devices back to life.
@@ -115,42 +186,29 @@ int device_suspend(pm_message_t state)
 
 		/* Check if the device got removed */
 		if (!list_empty(&dev->power.entry)) {
-			/* Move it to the dpm_off or dpm_off_irq list */
+			/* Move it to the dpm_off_irq list */
 			if (!error) {
 				list_del(&dev->power.entry);
 				list_add(&dev->power.entry, &dpm_off);
-			} else if (error == -EAGAIN) {
-				list_del(&dev->power.entry);
-				list_add(&dev->power.entry, &dpm_off_irq);
-				error = 0;
 			}
 		}
 		if (error)
 			printk(KERN_ERR "Could not suspend device %s: "
-				"error %d\n", kobject_name(&dev->kobj), error);
+				"error %d%s\n",
+				kobject_name(&dev->kobj), error,
+				error == -EAGAIN ? " (please convert to suspend_late)" : "");
 		put_device(dev);
 	}
 	up(&dpm_list_sem);
-	if (error) {
-		/* we failed... before resuming, bring back devices from
-		 * dpm_off_irq list back to main dpm_off list, we do want
-		 * to call resume() on them, in case they partially suspended
-		 * despite returning -EAGAIN
-		 */
-		while (!list_empty(&dpm_off_irq)) {
-			struct list_head * entry = dpm_off_irq.next;
-			list_del(entry);
-			list_add(entry, &dpm_off);
-		}
+	if (error)
 		dpm_resume();
-	}
+
 	up(&dpm_sem);
 	return error;
 }
 
 EXPORT_SYMBOL_GPL(device_suspend);
 
-
 /**
  *	device_power_down - Shut down special devices.
  *	@state:		Power state to enter.
@@ -165,14 +223,18 @@ int device_power_down(pm_message_t state
 	int error = 0;
 	struct device * dev;
 
-	list_for_each_entry_reverse(dev, &dpm_off_irq, power.entry) {
-		if ((error = suspend_device(dev, state)))
-			break;
+	while (!list_empty(&dpm_off)) {
+		struct list_head * entry = dpm_off.prev;
+
+		dev = to_device(entry);
+		error = suspend_device_late(dev, state);
+		if (error)
+			goto Error;
+		list_del(&dev->power.entry);
+		list_add(&dev->power.entry, &dpm_off_irq);
 	}
-	if (error)
-		goto Error;
-	if ((error = sysdev_suspend(state)))
-		goto Error;
+
+	error = sysdev_suspend(state);
  Done:
 	return error;
  Error:
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 10e1a90..f0af89b 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -265,6 +265,19 @@ static int pci_device_remove(struct devi
 	return 0;
 }
 
+static int pci_device_suspend_prepare(struct device * dev, pm_message_t state)
+{
+	struct pci_dev * pci_dev = to_pci_dev(dev);
+	struct pci_driver * drv = pci_dev->driver;
+	int i = 0;
+
+	if (drv && drv->suspend_prepare) {
+		i = drv->suspend_prepare(pci_dev, state);
+		suspend_report_result(drv->suspend_prepare, i);
+	}
+	return i;
+}
+
 static int pci_device_suspend(struct device * dev, pm_message_t state)
 {
 	struct pci_dev * pci_dev = to_pci_dev(dev);
@@ -280,7 +293,19 @@ static int pci_device_suspend(struct dev
 	return i;
 }
 
+static int pci_device_suspend_late(struct device * dev, pm_message_t state)
+{
+	struct pci_dev * pci_dev = to_pci_dev(dev);
+	struct pci_driver * drv = pci_dev->driver;
+	int i = 0;
 
+	if (drv && drv->suspend_late) {
+		i = drv->suspend_late(pci_dev, state);
+		suspend_report_result(drv->suspend_late, i);
+	}
+	return i;
+}
+		
 /*
  * Default resume method for devices that have no driver provided resume,
  * or not even a driver at all.
@@ -314,6 +339,17 @@ static int pci_device_resume(struct devi
 	return error;
 }
 
+static int pci_device_resume_early(struct device * dev)
+{
+	int error = 0;
+	struct pci_dev * pci_dev = to_pci_dev(dev);
+	struct pci_driver * drv = pci_dev->driver;
+
+	if (drv && drv->resume_early)
+		error = drv->resume_early(pci_dev);
+	return error;
+}
+
 static void pci_device_shutdown(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
@@ -509,9 +545,12 @@ struct bus_type pci_bus_type = {
 	.uevent		= pci_uevent,
 	.probe		= pci_device_probe,
 	.remove		= pci_device_remove,
+	.suspend_prepare= pci_device_suspend_prepare,
 	.suspend	= pci_device_suspend,
-	.shutdown	= pci_device_shutdown,
+	.suspend_late	= pci_device_suspend_late,
+	.resume_early	= pci_device_resume_early,
 	.resume		= pci_device_resume,
+	.shutdown	= pci_device_shutdown,
 	.dev_attrs	= pci_dev_attrs,
 };
 
diff --git a/include/linux/device.h b/include/linux/device.h
index 1e5f30d..99d2a18 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -51,8 +51,12 @@ struct bus_type {
 	int		(*probe)(struct device * dev);
 	int		(*remove)(struct device * dev);
 	void		(*shutdown)(struct device * dev);
-	int		(*suspend)(struct device * dev, pm_message_t state);
-	int		(*resume)(struct device * dev);
+
+	int (*suspend_prepare)(struct device * dev, pm_message_t state);
+	int (*suspend)(struct device * dev, pm_message_t state);
+	int (*suspend_late)(struct device * dev, pm_message_t state);
+	int (*resume_early)(struct device * dev);
+	int (*resume)(struct device * dev);
 };
 
 extern int bus_register(struct bus_type * bus);
@@ -154,6 +158,9 @@ struct class {
 
 	void	(*release)(struct class_device *dev);
 	void	(*class_release)(struct class *class);
+
+	int	(*suspend)(struct device *, pm_message_t state);
+	int	(*resume)(struct device *);
 };
 
 extern int class_register(struct class *);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 62a8c22..9a762c8 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -344,7 +344,10 @@ struct pci_driver {
 	const struct pci_device_id *id_table;	/* must be non-NULL for probe to be called */
 	int  (*probe)  (struct pci_dev *dev, const struct pci_device_id *id);	/* New device inserted */
 	void (*remove) (struct pci_dev *dev);	/* Device removed (NULL if not a hot-plug capable driver) */
+	int  (*suspend_prepare) (struct pci_dev *dev, pm_message_t state);
 	int  (*suspend) (struct pci_dev *dev, pm_message_t state);	/* Device suspended */
+	int  (*suspend_late) (struct pci_dev *dev, pm_message_t state);
+	int  (*resume_early) (struct pci_dev *dev);
 	int  (*resume) (struct pci_dev *dev);	                /* Device woken up */
 	int  (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable);   /* Enable wake event */
 	void (*shutdown) (struct pci_dev *dev);
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 658c1b9..096fb6f 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -190,6 +190,7 @@ #ifdef CONFIG_PM
 extern suspend_disk_method_t pm_disk_mode;
 
 extern int device_suspend(pm_message_t state);
+extern int device_prepare_suspend(pm_message_t state);
 
 #define device_set_wakeup_enable(dev,val) \
 	((dev)->power.should_wakeup = !!(val))
diff --git a/kernel/power/main.c b/kernel/power/main.c
index cdf0f07..18a0f91 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -57,6 +57,10 @@ static int suspend_prepare(suspend_state
 	if (!pm_ops || !pm_ops->enter)
 		return -EPERM;
 
+	error = device_prepare_suspend(PMSG_SUSPEND);
+	if (error)
+		return error;
+
 	pm_prepare_console();
 
 	disable_nonboot_cpus();


[Index of Archives]     [Linux ACPI]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [CPU Freq]     [Kernel Newbies]     [Fedora Kernel]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux