Re: [PATCH] PM: acquire device locks prior to suspending

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 13 Dec 2007 Andrew Morton wrote:

> On Thu, 13 Dec 2007 17:58:37 +0100 "Rafael J. Wysocki" <rjw at sisk.pl <https://lists.linux-foundation.org/mailman/listinfo/linux-pm>> wrote:
> 
> > On Thursday, 13 of December 2007, Andrew Morton wrote:
> > > On Fri, 21 Sep 2007 15:37:40 -0400 (EDT) Alan Stern <stern at rowland.harvard.edu <https://lists.linux-foundation.org/mailman/listinfo/linux-pm>> wrote:
> > > 
> > > > This patch (as994) reorganizes the way suspend and resume
> > > > notifications are sent to drivers.  The major changes are that now the
> > > > PM core acquires every device semaphore before calling the methods,
> > > > and calls to device_add() during suspends will fail.
> > > 
> > > Causes my t61p to deadlock during suspend-to-RAM.  Really late - the little
> > > moon symbol has started to flash but the LCD is still powered and the
> > > cursor still blinks.  Only a poweroff restores control.
> > 
> > Most probably, one of the drivers or a CPU hotplug notifier unregisters a
> > device during suspend (wrong).
> > 
> > Please boot with no_console_suspend and check if the box survives (with this
> > patch applied):
> > 
> > # echo 8 > /proc/sys/kernel/printk
> > # echo processors > /sys/power/pm_test
> > # echo mem > /sys/power/state
> > 
> > If it doesn't, you can try 
> > 
> > # echo platform > /sys/power/pm_test
> > # echo mem > /sys/power/state
> > 
> > and
> > 
> > # echo devices > /sys/power/pm_test
> > # echo mem > /sys/power/state
> 
> hm, that was all fairly helpful.  <looks at the document> <erk, long>
> 
> This:
...
> gives me this:
> 
> http://userweb.kernel.org/~akpm/pc131699.jpg
> 
> which identifies your culprit: msr.
> 
> I'd suggest that the above debug patch be turned into some
> boot-option-enabled thing and that it be rolled out with this locking
> change for a while at least.

Going through the debugging changes you made led to some
process-of-elimination reasoning:

	There's no need to trace the locking of dpm_list_mutex,
	since the things done while that mutex are held are quite
	safe and should never deadlock (other than trying to
	acquire other locks!).

	There's no need to trace the locking of pm_sleep_rwsem.
	It gets locked for writing just once, at the very start
	of the device-suspend sequence.  If that failed, it would
	be easy to diagnose.  The only other place it is locked
	uses down_read_trylock, which can't block.

	This leaves only dev->sem, which is acquired in
	lock_all_devices and device_pm_remove.  But the call in
	device_pm_remove isn't a good place to monitor.  It worked
	in this case, but if the device has a driver then the deadlock
	will occur earlier, in device_release_driver.

	On further thought, the call in lock_all_devices should be
	safe enough.  It doesn't do anything until all the locks
	have been acquired, and so far it hasn't caused any problems.

Put together, this suggests a better strategy.  Instead of monitoring
the various locking calls, instead we should warn about dangerous
actions on the part of drivers.  In both cases where problems
occurred, it was because a driver tried to unregister a device during
suspend.  It's easy to detect such actions and print a big fat
warning.  We might as well warn about attempts to register too.

Andrew, can you confirm that the patch below, without any of your
changes, would indeed have caught the problem with the msr driver?
For proper testing you have to:

	boot with "no_console_suspend";

	"echo 8 >/proc/sys/kernel/printk" before suspending.
	Perhaps also "echo processors >/sys/power/pm_test".

If it does, I'll submit this properly with a changelog and
Signed-Off-By line.  It's a lot simpler than your checks, doesn't need
a command-line parameter, and can safely be used all the time.

Alan Stern


Index: usb-2.6/drivers/base/core.c
===================================================================
--- usb-2.6.orig/drivers/base/core.c
+++ usb-2.6/drivers/base/core.c
@@ -789,8 +789,11 @@ int device_add(struct device *dev)
 	int error;
 
 	error = pm_sleep_lock();
-	if (error)
+	if (error) {
+		dev_warn(dev, "Illegal %s during suspend\n", __FUNCTION__);
+		dump_stack();
 		return error;
+	}
 
 	dev = get_device(dev);
 	if (!dev || !strlen(dev->bus_id)) {
@@ -953,6 +956,13 @@ void device_del(struct device * dev)
 	struct device * parent = dev->parent;
 	struct class_interface *class_intf;
 
+	if (pm_sleep_lock()) {
+		dev_warn(dev, "Illegal %s during suspend\n", __FUNCTION__);
+		dump_stack();
+	} else {
+		pm_sleep_unlock();
+	}
+
 	if (parent)
 		klist_del(&dev->knode_parent);
 	if (MAJOR(dev->devt))

_______________________________________________
linux-pm mailing list
linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

[Index of Archives]     [Linux ACPI]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [CPU Freq]     [Kernel Newbies]     [Fedora Kernel]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux