On Fri, Jul 29, 2011 at 7:32 PM, Felipe Balbi <balbi@xxxxxx> wrote: > Hi, > > On Fri, Jul 29, 2011 at 06:28:32PM +0530, Govindraj wrote: >> On Fri, Jul 29, 2011 at 5:49 PM, Felipe Balbi <balbi@xxxxxx> wrote: >> > On Fri, Jul 29, 2011 at 05:29:12PM +0530, Govindraj wrote: >> >> Yes fine, But there are scenarios even before first runtime_suspend happens, >> >> >> >> ex: uart_port_configure -> get_sync -> pm_generic_runtime_resume >> >> (omap_device_enable in this case) debug printk -> console_write -> get_sync. >> >> >> >> there are numerous such scenarios where we end from runtime context >> >> to runtime api context again, or jumping from one uart port operation >> >> to uart print operation. >> > >> > calling pm_runtime_get_sync() should not be a problem. It should only >> > increase the usage counters... This sounds like a race condition on the >> > driver, no ? >> >> Actually when we call a API to enable clocks we except the internals of API >> to just enable clocks and return. >> >> *Clock enable API should not cause or trigger to do a _device_driver_operation_ >> even before enabling clocks of the device-driver which called it* >> >> for uart context get_sync can land me to uart driver back >> even before enabling the uart clocks due to printks. > > only if _you_ have prints or _your_ runtime_*() calls, no ? > > Let's say omap_hwmod.c wants to do a print: > > -> printk() > -> pm_runtime_get_sync > -> console_write > -> pm_runtim_put > > now, if you have a printk() on your runtime_resume() before you enable > the clocks, then I can see why you would deadlock: > > -> pm_runtime_get_sync > -> omap_serial_runtime_resume > -> printk > -> pm_runtime_get_sync > -> omap_serial_runtime_resume > -> printk > -> pm_runtime_get_sync > ..... > > maybe I'm missing something, but can you add a stack dump on your > ->runtime_resume and ->runtime_suspend methods, just so we try to figure > out who's to blame here ? > >> > What you're experiencing, if I understood correctly, is a deadlock ? In >> > that case, can you try to track the locking mechanism on the omap-serial >> > driver to try to find if there isn't anything obviously wrong ? >> > >> >> Yes deadlocks. due to entering from runtime context to runtime context >> or entering from uart_port_operation to uart_console_write ops. >> >> There are already port locks used extensively within the uart driver >> to secure a port operation. >> >> But cannot secure a port operation while using clock_enable API. >> since clock enable API can land the control back to uart_console_write >> operation.. > > but in that case, if clock isn't enabled, why don't you just ignore the > print and enable the clock ?? Just return 0 and continue with > clk_enable() ?? > >> >> So either we should not have those prints from pm_generic layers or suppress >> >> them(seems pretty much a problem for a clean design within the driver >> >> using console_lock/unlock for every get_sync, and for >> >> runtime_put we cannot suppress the prints as it gets scheduled later) >> >> >> >> or if other folks who really need those prints form pm_generic* layers >> >> to debug and analysis then we have no other choice rather control >> >> the clk_enable/disable from outside driver code in idle path. >> > >> > yeah, none of these would be nice :-( >> > >> > I think this needs more debugging to be sure what's exactly going on. >> > What's exactly causing the deadlock ? Which lock is held and never >> > released ? >> > >> >> I had done some investigations, from scenarios it simply boils down to fact >> to handle clock within uart driver, uart driver expects clock enable API* used >> to just enable uart clocks but rather not trigger a _uart_ops_ within which >> kind of unacceptable from the uart_driver context. > > ok, now I see what you mean: > > 113 static int _omap_device_activate(struct omap_device *od, u8 ignore_lat) > 114 { > 115 struct timespec a, b, c; > 116 > 117 pr_debug("omap_device: %s: activating\n", od->pdev.name); > 118 > 119 while (od->pm_lat_level > 0) { > 120 struct omap_device_pm_latency *odpl; > 121 unsigned long long act_lat = 0; > 122 > 123 od->pm_lat_level--; > 124 > 125 odpl = od->pm_lats + od->pm_lat_level; > 126 > 127 if (!ignore_lat && > 128 (od->dev_wakeup_lat <= od->_dev_wakeup_lat_limit)) > 129 break; > 130 > 131 read_persistent_clock(&a); > 132 > 133 /* XXX check return code */ > 134 odpl->activate_func(od); > 135 > 136 read_persistent_clock(&b); > 137 > 138 c = timespec_sub(b, a); > 139 act_lat = timespec_to_ns(&c); > 140 > 141 pr_debug("omap_device: %s: pm_lat %d: activate: elapsed time " > 142 "%llu nsec\n", od->pdev.name, od->pm_lat_level, > 143 act_lat); > 144 > 145 if (act_lat > odpl->activate_lat) { > 146 odpl->activate_lat_worst = act_lat; > 147 if (odpl->flags & OMAP_DEVICE_LATENCY_AUTO_ADJUST) { > 148 odpl->activate_lat = act_lat; > 149 pr_warning("omap_device: %s.%d: new worst case " > 150 "activate latency %d: %llu\n", > 151 od->pdev.name, od->pdev.id, > 152 od->pm_lat_level, act_lat); > 153 } else > 154 pr_warning("omap_device: %s.%d: activate " > 155 "latency %d higher than exptected. " > 156 "(%llu > %d)\n", > 157 od->pdev.name, od->pdev.id, > 158 od->pm_lat_level, act_lat, > 159 odpl->activate_lat); > 160 } > 161 > 162 od->dev_wakeup_lat -= odpl->activate_lat; > 163 } > 164 > 165 return 0; > 166 } > > When that first pr_debug() triggers, UART's hwmod could be disabled, and > that would trigger the state I described above where you would keep on > calling pm_runtime_get_sync() forever ;-) > > isn't it enough to patch it like below: > > diff --git a/arch/arm/plat-omap/omap_device.c b/arch/arm/plat-omap/omap_device.c > index b6b4097..560f622 100644 > --- a/arch/arm/plat-omap/omap_device.c > +++ b/arch/arm/plat-omap/omap_device.c > @@ -114,8 +114,6 @@ static int _omap_device_activate(struct omap_device *od, u8 ignore_lat) > { > struct timespec a, b, c; > > - pr_debug("omap_device: %s: activating\n", od->pdev.name); > - > while (od->pm_lat_level > 0) { > struct omap_device_pm_latency *odpl; > unsigned long long act_lat = 0; > @@ -162,6 +160,8 @@ static int _omap_device_activate(struct omap_device *od, u8 ignore_lat) > od->dev_wakeup_lat -= odpl->activate_lat; > } > > + pr_debug("omap_device: %s: activated\n", od->pdev.name); > + > return 0; > } > Actually there is much more than this: <<SNIP>> diff --git a/arch/arm/mach-omap2/clock.c b/arch/arm/mach-omap2/clock.c index 180299e..221ffb9 100644 --- a/arch/arm/mach-omap2/clock.c +++ b/arch/arm/mach-omap2/clock.c @@ -12,7 +12,8 @@ * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. */ -#undef DEBUG +//#undef DEBUG +#define DEBUG #include <linux/kernel.h> #include <linux/list.h> @@ -254,14 +255,14 @@ void omap2_clk_disable(struct clk *clk) return; } - pr_debug("clock: %s: decrementing usecount\n", clk->name); +// pr_debug("clock: %s: decrementing usecount\n", clk->name); clk->usecount--; if (clk->usecount > 0) return; - pr_debug("clock: %s: disabling in hardware\n", clk->name); +// pr_debug("clock: %s: disabling in hardware\n", clk->name); if (clk->ops && clk->ops->disable) { trace_clock_disable(clk->name, 0, smp_processor_id()); @@ -290,14 +291,14 @@ int omap2_clk_enable(struct clk *clk) { int ret; - pr_debug("clock: %s: incrementing usecount\n", clk->name); +// pr_debug("clock: %s: incrementing usecount\n", clk->name); clk->usecount++; if (clk->usecount > 1) return 0; - pr_debug("clock: %s: enabling in hardware\n", clk->name); +// pr_debug("clock: %s: enabling in hardware\n", clk->name); if (clk->parent) { ret = omap2_clk_enable(clk->parent); diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c index 7ed0479..8ca7d40 100644 --- a/arch/arm/mach-omap2/omap_hwmod.c +++ b/arch/arm/mach-omap2/omap_hwmod.c @@ -124,7 +124,8 @@ * XXX error return values should be checked to ensure that they are * appropriate */ -#undef DEBUG +//#undef DEBUG +#define DEBUG #include <linux/kernel.h> #include <linux/errno.h> @@ -597,7 +598,8 @@ static int _enable_clocks(struct omap_hwmod *oh) { int i; - pr_debug("omap_hwmod: %s: enabling clocks\n", oh->name); + if (strcmp(oh->class->name, "uart")) + pr_debug("omap_hwmod: %s: enabling clocks\n", oh->name); if (oh->_clk) clk_enable(oh->_clk); @@ -627,7 +629,8 @@ static int _disable_clocks(struct omap_hwmod *oh) { int i; - pr_debug("omap_hwmod: %s: disabling clocks\n", oh->name); + if (strcmp(oh->class->name, "uart")) + pr_debug("omap_hwmod: %s: disabling clocks\n", oh->name); if (oh->_clk) clk_disable(oh->_clk); @@ -1232,7 +1235,8 @@ static int _enable(struct omap_hwmod *oh) return -EINVAL; } - pr_debug("omap_hwmod: %s: enabling\n", oh->name); + if (strcmp(oh->class->name, "uart")) + pr_debug("omap_hwmod: %s: enabling\n", oh->name); /* * If an IP contains only one HW reset line, then de-assert it in order @@ -1264,8 +1268,9 @@ static int _enable(struct omap_hwmod *oh) } } else { _disable_clocks(oh); - pr_debug("omap_hwmod: %s: _wait_target_ready: %d\n", - oh->name, r); + if (strcmp(oh->class->name, "uart")) + pr_debug("omap_hwmod: %s: _wait_target_ready: %d\n", + oh->name, r); } return r; @@ -1287,7 +1292,8 @@ static int _idle(struct omap_hwmod *oh) return -EINVAL; } - pr_debug("omap_hwmod: %s: idling\n", oh->name); + if (strcmp(oh->class->name, "uart")) + pr_debug("omap_hwmod: %s: idling\n", oh->name); if (oh->class->sysc) _idle_sysc(oh); diff --git a/arch/arm/plat-omap/omap_device.c b/arch/arm/plat-omap/omap_device.c index 49fc0df..7b27704 100644 --- a/arch/arm/plat-omap/omap_device.c +++ b/arch/arm/plat-omap/omap_device.c @@ -75,7 +75,8 @@ * (device must be reinitialized at this point to use it again) * */ -#undef DEBUG +//#undef DEBUG +#define DEBUG #include <linux/kernel.h> #include <linux/platform_device.h> @@ -114,7 +115,8 @@ static int _omap_device_activate(struct omap_device *od, u8 ignore_lat) { struct timespec a, b, c; - pr_debug("omap_device: %s: activating\n", od->pdev.name); + if (strcmp(od->hwmods[0]->class->name, "uart")) + pr_debug("omap_device: %s: activating\n", od->pdev.name); while (od->pm_lat_level > 0) { struct omap_device_pm_latency *odpl; @@ -138,25 +140,29 @@ static int _omap_device_activate(struct omap_device *od, u8 ignore_lat) c = timespec_sub(b, a); act_lat = timespec_to_ns(&c); - pr_debug("omap_device: %s: pm_lat %d: activate: elapsed time " - "%llu nsec\n", od->pdev.name, od->pm_lat_level, - act_lat); + if (strcmp(od->hwmods[0]->class->name, "uart")) + pr_debug("omap_device: %s: pm_lat %d: activate: elapsed time " + "%llu nsec\n", od->pdev.name, od->pm_lat_level, + act_lat); if (act_lat > odpl->activate_lat) { odpl->activate_lat_worst = act_lat; if (odpl->flags & OMAP_DEVICE_LATENCY_AUTO_ADJUST) { odpl->activate_lat = act_lat; - pr_warning("omap_device: %s.%d: new worst case " - "activate latency %d: %llu\n", - od->pdev.name, od->pdev.id, - od->pm_lat_level, act_lat); - } else - pr_warning("omap_device: %s.%d: activate " - "latency %d higher than exptected. " - "(%llu > %d)\n", - od->pdev.name, od->pdev.id, - od->pm_lat_level, act_lat, - odpl->activate_lat); + if (strcmp(od->hwmods[0]->class->name, "uart")) + pr_warning("omap_device: %s.%d: new worst case " + "activate latency %d: %llu\n", + od->pdev.name, od->pdev.id, + od->pm_lat_level, act_lat); + } else { + if (strcmp(od->hwmods[0]->class->name, "uart")) + pr_warning("omap_device: %s.%d: activate " + "latency %d higher than exptected. " + "(%llu > %d)\n", + od->pdev.name, od->pdev.id, + od->pm_lat_level, act_lat, + odpl->activate_lat); + } } od->dev_wakeup_lat -= odpl->activate_lat; @@ -183,7 +189,8 @@ static int _omap_device_deactivate(struct omap_device *od, u8 ignore_lat) { struct timespec a, b, c; - pr_debug("omap_device: %s: deactivating\n", od->pdev.name); + if (strcmp(od->hwmods[0]->class->name, "uart")) + pr_debug("omap_device: %s: deactivating\n", od->pdev.name); while (od->pm_lat_level < od->pm_lats_cnt) { struct omap_device_pm_latency *odpl; @@ -206,25 +213,29 @@ static int _omap_device_deactivate(struct omap_device *od, u8 ignore_lat) c = timespec_sub(b, a); deact_lat = timespec_to_ns(&c); - pr_debug("omap_device: %s: pm_lat %d: deactivate: elapsed time " - "%llu nsec\n", od->pdev.name, od->pm_lat_level, - deact_lat); + if (strcmp(od->hwmods[0]->class->name, "uart")) + pr_debug("omap_device: %s: pm_lat %d: deactivate: elapsed time " + "%llu nsec\n", od->pdev.name, od->pm_lat_level, + deact_lat); if (deact_lat > odpl->deactivate_lat) { odpl->deactivate_lat_worst = deact_lat; if (odpl->flags & OMAP_DEVICE_LATENCY_AUTO_ADJUST) { odpl->deactivate_lat = deact_lat; - pr_warning("omap_device: %s.%d: new worst case " - "deactivate latency %d: %llu\n", - od->pdev.name, od->pdev.id, - od->pm_lat_level, deact_lat); - } else - pr_warning("omap_device: %s.%d: deactivate " + if (strcmp(od->hwmods[0]->class->name, "uart")) + pr_warning("omap_device: %s.%d: new worst case " + "deactivate latency %d: %llu\n", + od->pdev.name, od->pdev.id, + od->pm_lat_level, deact_lat); + } else { + if (strcmp(od->hwmods[0]->class->name, "uart")) + pr_warning("omap_device: %s.%d: deactivate " "latency %d higher than exptected. " "(%llu > %d)\n", od->pdev.name, od->pdev.id, od->pm_lat_level, deact_lat, odpl->deactivate_lat); + } } <<SNIP>> > > either the above or something like: > > if (pm_runtime_suspended(dev)) > return 0; > > on console_write() ?? Consider the scenario where after bootup uart is idled with runtime auto suspend so now state of console uart is RPM_SUSPENDED. Now you connect a pendrive to ehci-port. Missing critical usb host <-> device_enumeration prints? uart_console rpm_suspended state -> usb_device mass storage device found print -> console_write -> return without printing. -- Thanks, Govindraj.R -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html