On Saturday, June 11, 2011, Kevin Hilman wrote: > "Rafael J. Wysocki" <rjw@xxxxxxx> writes: > > [...] > > > Whether or not user space has disabled runtime PM _doesn't_ _matter_ for > > system suspend, because _you_ _can't_ call pm_runtime_suspend(), or > > pm_runtime_put_sunc(), from a driver's .suspend() callback _anyway_. > > The reason is that doing that would cause the subsystem's (or power > > domain's in this case) .runtime_suspend() callback to be invoked and > > that's incorrect. Namely, it would require the subsystem (power domain) > > to expect that its .runtime_suspend() would always be executed indirectly > > as a result of calling its .suspend() (through the driver's callback) > > and that expectation may or may not be met (depending on the driver's > > design). > > So here's an interesting scenario which I think it triggers the same > problem as you highlight above. > > Assume you have a driver that's using runtime PM on a per-xfer basis. > Before each xfer, it does a pm_runtime_get_sync(), after each xfer it > does a pm_runtime_put_sync() (for this example, it's important that it's > a _put_sync()). The _put_sync() might happen in an ISR, It can't happen in an ISR, to be precise. > or possibly in a thread waiting on a completion which is awoken by the ISR, > etc. etc. (the runtime PM callbacks are IRQ safe, and device is marked as such.) > > The driver is in the middle of an xfer and a system suspend request > happens. > > The driver's ->suspend() callback happens, and the driver > > - enables/disables wakeups based on device_may_wakeup() > - prevents future xfers > - waits for current xfer to finish > > As soon as the xfer finishes, the driver gets notified (completion, > callback, IRQ, whatever) and calls pm_runtime_put_sync(), which triggers > subsys->runtime_suspend --> driver->runtime_suspend. > > While the driver's ->suspend() callback doesn't directly call > pm_runtime_put_sync(), the act of waiting for the xfer to finish > causes the subsystem/driver->runtime_suspend callbacks to be called > during the subsytem/driver->suspend callback, which is the same problem > as you highlight above. It's not exactly the same. The difference is that you're talking about race conditions between runtime PM and system suspend (I kind of know why I wanted system suspend to block runtime PM now :-)) that may be prevented by subsystem-level code from happening (by using locking and some flags etc.), while that code cannot do much if its .runtime_suspend() callback, for example, is executed directly from the system suspend code path. > Based on your commit that removed incrementing the usage count across > suspend[1], you mentioned "we can rely on subsystems and device drivers > to avoid doing that unnecessarily." The above example shows that this > type of thing might not be that obvious to detect and thus avoid. > > I suspect the solution to the above will be to add back the usage count > increment across system suspend, but I'm hoping not. IMO, it would be > more flexible to allow the subsystems to decide. The subsystems could > provide locking (or manage dev->power.usage_count) themselves if > necessary. For example, leave it to the subsystem->prepare() to > pm_runtime_get_noresume() if it wants to avoid the "nesting" of > callbacks. I agree. > A related question: does the pm_wq need to be freezable? From > Documentation/power/runtime_pm.txt: > > * The power management workqueue pm_wq in which bus types and device drivers can > put their PM-related work items. It is strongly recommended that pm_wq be > used for queuing all work items related to run-time PM, because this allows > them to be synchronized with system-wide power transitions (suspend to RAM, > hibernation and resume from system sleep states). pm_wq is declared in > include/linux/pm_runtime.h and defined in kernel/power/main.c. > > Is "synchronized with system-wide power transistions" correct here? > Rather than synchronize, using a freezable workqueue actually _prevents_ > runtime PM events (at least async ones.) > > Again, proper locking (or management of dev->power.usage_count) at the > subsystem level would get you the same effect, but still leave > flexibility to the subsystem/pwr_domain layer. No, please. The problem here is that I don't want runtime PM stuff to be called during the "noirq" stages of system suspend and resume which the freezing of the workqueue takes care of nicely. > P.S. the commit below[1] removed the usage count increment/decrement > across system suspend/resume, but Documentation/power/runtime_pm.txt > still refers to it. Patch below[2] removes it, ssuming you're > not planning on adding it back. ;) No, I'm not. In fact, I'm going to apply your patch. :-) Thanks, Rafael > [1] > commit e8665002477f0278f84f898145b1f141ba26ee26 > Author: Rafael J. Wysocki <rjw@xxxxxxx> > Date: Sat Feb 12 01:42:41 2011 +0100 > > PM: Allow pm_runtime_suspend() to succeed during system suspend > > The dpm_prepare() function increments the runtime PM reference > counters of all devices to prevent pm_runtime_suspend() from > executing subsystem-level callbacks. However, this was supposed to > guard against a specific race condition that cannot happen, because > the power management workqueue is freezable, so pm_runtime_suspend() > can only be called synchronously during system suspend and we can > rely on subsystems and device drivers to avoid doing that > unnecessarily. > > Make dpm_prepare() drop the runtime PM reference to each device > after making sure that runtime resume is not pending for it. > > Signed-off-by: Rafael J. Wysocki <rjw@xxxxxxx> > Acked-by: Kevin Hilman <khilman@xxxxxx> > > [2] > From 8968e3e41d785e7e5ce7584d64f6a55b303e7060 Mon Sep 17 00:00:00 2001 > From: Kevin Hilman <khilman@xxxxxx> > Date: Fri, 10 Jun 2011 16:05:51 -0700 > Subject: [PATCH] PM / Runtime: update doc: usage count no longer incremented across system PM > > commit e8665002477f0278f84f898145b1f141ba26ee26 (PM: Allow > pm_runtime_suspend() to succeed during system suspend) removed usage > count increment across system PM. > > Update doc to reflect this. > > Signed-off-by: Kevin Hilman <khilman@xxxxxx> > --- > Applies on v3.0-rc2 > > Documentation/power/runtime_pm.txt | 5 ----- > 1 files changed, 0 insertions(+), 5 deletions(-) > > diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt > index 654097b..22accb3 100644 > --- a/Documentation/power/runtime_pm.txt > +++ b/Documentation/power/runtime_pm.txt > @@ -566,11 +566,6 @@ to do this is: > pm_runtime_set_active(dev); > pm_runtime_enable(dev); > > -The PM core always increments the run-time usage counter before calling the > -->prepare() callback and decrements it after calling the ->complete() callback. > -Hence disabling run-time PM temporarily like this will not cause any run-time > -suspend callbacks to be lost. > - > 7. Generic subsystem callbacks > > Subsystems may wish to conserve code space by using the set of generic power > -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html