Re: [RFC v2]: Issues implementing clock handling mechanism within UART driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 29, 2011 at 7:32 PM, Felipe Balbi <balbi@xxxxxx> wrote:
> Hi,
>
> On Fri, Jul 29, 2011 at 06:28:32PM +0530, Govindraj wrote:
>> On Fri, Jul 29, 2011 at 5:49 PM, Felipe Balbi <balbi@xxxxxx> wrote:
>> > On Fri, Jul 29, 2011 at 05:29:12PM +0530, Govindraj wrote:
>> >> Yes fine, But there are scenarios even before first runtime_suspend happens,
>> >>
>> >> ex: uart_port_configure -> get_sync -> pm_generic_runtime_resume
>> >> (omap_device_enable in this case) debug printk -> console_write -> get_sync.
>> >>
>> >> there are numerous such scenarios where we end from runtime context
>> >> to runtime api context again, or jumping from one uart port operation
>> >> to uart print operation.
>> >
>> > calling pm_runtime_get_sync() should not be a problem. It should only
>> > increase the usage counters... This sounds like a race condition on the
>> > driver, no ?
>>
>> Actually when we call a API to enable clocks we except the internals of API
>> to just enable clocks and return.
>>
>> *Clock enable API should not cause or trigger to do a _device_driver_operation_
>> even before enabling clocks of the device-driver which called it*
>>
>> for uart context get_sync can land me to uart driver back
>> even before enabling the uart clocks due to printks.
>
> only if _you_ have prints or _your_ runtime_*() calls, no ?
>
> Let's say omap_hwmod.c wants to do a print:
>
> -> printk()
>  -> pm_runtime_get_sync
>    -> console_write
>  -> pm_runtim_put
>
> now, if you have a printk() on your runtime_resume() before you enable
> the clocks, then I can see why you would deadlock:
>
> -> pm_runtime_get_sync
>  -> omap_serial_runtime_resume
>    -> printk
>      -> pm_runtime_get_sync
>        -> omap_serial_runtime_resume
>          -> printk
>           -> pm_runtime_get_sync
>            .....
>
> maybe I'm missing something, but can you add a stack dump on your
> ->runtime_resume and ->runtime_suspend methods, just so we try to figure
> out who's to blame here ?
>
>> > What you're experiencing, if I understood correctly, is a deadlock ? In
>> > that case, can you try to track the locking mechanism on the omap-serial
>> > driver to try to find if there isn't anything obviously wrong ?
>> >
>>
>> Yes deadlocks. due to entering from runtime context to runtime context
>> or entering from uart_port_operation to uart_console_write ops.
>>
>> There are already port locks used extensively within the uart driver
>> to secure a port operation.
>>
>> But cannot secure a port operation while using clock_enable API.
>> since clock enable API can land the control back to uart_console_write
>> operation..
>
> but in that case, if clock isn't enabled, why don't you just ignore the
> print and enable the clock ?? Just return 0 and continue with
> clk_enable() ??
>
>> >> So either we should not have those prints from pm_generic layers or suppress
>> >> them(seems pretty much a problem for a clean design within the driver
>> >> using console_lock/unlock for every get_sync, and for
>> >> runtime_put we cannot suppress the prints as it gets scheduled later)
>> >>
>> >> or if other folks who really need those prints form pm_generic* layers
>> >> to debug and analysis then we have no other choice rather control
>> >> the clk_enable/disable from outside driver code in idle path.
>> >
>> > yeah, none of these would be nice :-(
>> >
>> > I think this needs more debugging to be sure what's exactly going on.
>> > What's exactly causing the deadlock ? Which lock is held and never
>> > released ?
>> >
>>
>> I had done some investigations, from scenarios it simply boils down to fact
>> to handle clock within uart driver, uart driver expects clock enable API* used
>> to just enable uart clocks but rather not trigger a _uart_ops_ within which
>> kind of unacceptable from the uart_driver context.
>
> ok, now I see what you mean:
>
> 113 static int _omap_device_activate(struct omap_device *od, u8 ignore_lat)
> 114 {
> 115         struct timespec a, b, c;
> 116
> 117         pr_debug("omap_device: %s: activating\n", od->pdev.name);
> 118
> 119         while (od->pm_lat_level > 0) {
> 120                 struct omap_device_pm_latency *odpl;
> 121                 unsigned long long act_lat = 0;
> 122
> 123                 od->pm_lat_level--;
> 124
> 125                 odpl = od->pm_lats + od->pm_lat_level;
> 126
> 127                 if (!ignore_lat &&
> 128                     (od->dev_wakeup_lat <= od->_dev_wakeup_lat_limit))
> 129                         break;
> 130
> 131                 read_persistent_clock(&a);
> 132
> 133                 /* XXX check return code */
> 134                 odpl->activate_func(od);
> 135
> 136                 read_persistent_clock(&b);
> 137
> 138                 c = timespec_sub(b, a);
> 139                 act_lat = timespec_to_ns(&c);
> 140
> 141                 pr_debug("omap_device: %s: pm_lat %d: activate: elapsed time "
> 142                          "%llu nsec\n", od->pdev.name, od->pm_lat_level,
> 143                          act_lat);
> 144
> 145                 if (act_lat > odpl->activate_lat) {
> 146                         odpl->activate_lat_worst = act_lat;
> 147                         if (odpl->flags & OMAP_DEVICE_LATENCY_AUTO_ADJUST) {
> 148                                 odpl->activate_lat = act_lat;
> 149                                 pr_warning("omap_device: %s.%d: new worst case "
> 150                                            "activate latency %d: %llu\n",
> 151                                            od->pdev.name, od->pdev.id,
> 152                                            od->pm_lat_level, act_lat);
> 153                         } else
> 154                                 pr_warning("omap_device: %s.%d: activate "
> 155                                            "latency %d higher than exptected. "
> 156                                            "(%llu > %d)\n",
> 157                                            od->pdev.name, od->pdev.id,
> 158                                            od->pm_lat_level, act_lat,
> 159                                            odpl->activate_lat);
> 160                 }
> 161
> 162                 od->dev_wakeup_lat -= odpl->activate_lat;
> 163         }
> 164
> 165         return 0;
> 166 }
>
> When that first pr_debug() triggers, UART's hwmod could be disabled, and
> that would trigger the state I described above where you would keep on
> calling pm_runtime_get_sync() forever ;-)
>
> isn't it enough to patch it like below:
>
> diff --git a/arch/arm/plat-omap/omap_device.c b/arch/arm/plat-omap/omap_device.c
> index b6b4097..560f622 100644
> --- a/arch/arm/plat-omap/omap_device.c
> +++ b/arch/arm/plat-omap/omap_device.c
> @@ -114,8 +114,6 @@ static int _omap_device_activate(struct omap_device *od, u8 ignore_lat)
>  {
>        struct timespec a, b, c;
>
> -       pr_debug("omap_device: %s: activating\n", od->pdev.name);
> -
>        while (od->pm_lat_level > 0) {
>                struct omap_device_pm_latency *odpl;
>                unsigned long long act_lat = 0;
> @@ -162,6 +160,8 @@ static int _omap_device_activate(struct omap_device *od, u8 ignore_lat)
>                od->dev_wakeup_lat -= odpl->activate_lat;
>        }
>
> +       pr_debug("omap_device: %s: activated\n", od->pdev.name);
> +
>        return 0;
>  }
>

Actually there is much more than this:

<<SNIP>>

diff --git a/arch/arm/mach-omap2/clock.c b/arch/arm/mach-omap2/clock.c
index 180299e..221ffb9 100644
--- a/arch/arm/mach-omap2/clock.c
+++ b/arch/arm/mach-omap2/clock.c
@@ -12,7 +12,8 @@
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  */
-#undef DEBUG
+//#undef DEBUG
+#define DEBUG

 #include <linux/kernel.h>
 #include <linux/list.h>
@@ -254,14 +255,14 @@ void omap2_clk_disable(struct clk *clk)
 		return;
 	}

-	pr_debug("clock: %s: decrementing usecount\n", clk->name);
+//	pr_debug("clock: %s: decrementing usecount\n", clk->name);

 	clk->usecount--;

 	if (clk->usecount > 0)
 		return;

-	pr_debug("clock: %s: disabling in hardware\n", clk->name);
+//	pr_debug("clock: %s: disabling in hardware\n", clk->name);

 	if (clk->ops && clk->ops->disable) {
 		trace_clock_disable(clk->name, 0, smp_processor_id());
@@ -290,14 +291,14 @@ int omap2_clk_enable(struct clk *clk)
 {
 	int ret;

-	pr_debug("clock: %s: incrementing usecount\n", clk->name);
+//	pr_debug("clock: %s: incrementing usecount\n", clk->name);

 	clk->usecount++;

 	if (clk->usecount > 1)
 		return 0;

-	pr_debug("clock: %s: enabling in hardware\n", clk->name);
+//	pr_debug("clock: %s: enabling in hardware\n", clk->name);

 	if (clk->parent) {
 		ret = omap2_clk_enable(clk->parent);
diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c
index 7ed0479..8ca7d40 100644
--- a/arch/arm/mach-omap2/omap_hwmod.c
+++ b/arch/arm/mach-omap2/omap_hwmod.c
@@ -124,7 +124,8 @@
  * XXX error return values should be checked to ensure that they are
  * appropriate
  */
-#undef DEBUG
+//#undef DEBUG
+#define DEBUG

 #include <linux/kernel.h>
 #include <linux/errno.h>
@@ -597,7 +598,8 @@ static int _enable_clocks(struct omap_hwmod *oh)
 {
 	int i;

-	pr_debug("omap_hwmod: %s: enabling clocks\n", oh->name);
+	if (strcmp(oh->class->name, "uart"))
+		pr_debug("omap_hwmod: %s: enabling clocks\n", oh->name);

 	if (oh->_clk)
 		clk_enable(oh->_clk);
@@ -627,7 +629,8 @@ static int _disable_clocks(struct omap_hwmod *oh)
 {
 	int i;

-	pr_debug("omap_hwmod: %s: disabling clocks\n", oh->name);
+	if (strcmp(oh->class->name, "uart"))
+		pr_debug("omap_hwmod: %s: disabling clocks\n", oh->name);

 	if (oh->_clk)
 		clk_disable(oh->_clk);
@@ -1232,7 +1235,8 @@ static int _enable(struct omap_hwmod *oh)
 		return -EINVAL;
 	}

-	pr_debug("omap_hwmod: %s: enabling\n", oh->name);
+	if (strcmp(oh->class->name, "uart"))
+		pr_debug("omap_hwmod: %s: enabling\n", oh->name);

 	/*
 	 * If an IP contains only one HW reset line, then de-assert it in order
@@ -1264,8 +1268,9 @@ static int _enable(struct omap_hwmod *oh)
 		}
 	} else {
 		_disable_clocks(oh);
-		pr_debug("omap_hwmod: %s: _wait_target_ready: %d\n",
-			 oh->name, r);
+		if (strcmp(oh->class->name, "uart"))
+			pr_debug("omap_hwmod: %s: _wait_target_ready: %d\n",
+				 oh->name, r);
 	}

 	return r;
@@ -1287,7 +1292,8 @@ static int _idle(struct omap_hwmod *oh)
 		return -EINVAL;
 	}

-	pr_debug("omap_hwmod: %s: idling\n", oh->name);
+	if (strcmp(oh->class->name, "uart"))
+		pr_debug("omap_hwmod: %s: idling\n", oh->name);

 	if (oh->class->sysc)
 		_idle_sysc(oh);
diff --git a/arch/arm/plat-omap/omap_device.c b/arch/arm/plat-omap/omap_device.c
index 49fc0df..7b27704 100644
--- a/arch/arm/plat-omap/omap_device.c
+++ b/arch/arm/plat-omap/omap_device.c
@@ -75,7 +75,8 @@
  * (device must be reinitialized at this point to use it again)
  *
  */
-#undef DEBUG
+//#undef DEBUG
+#define DEBUG

 #include <linux/kernel.h>
 #include <linux/platform_device.h>
@@ -114,7 +115,8 @@ static int _omap_device_activate(struct
omap_device *od, u8 ignore_lat)
 {
 	struct timespec a, b, c;

-	pr_debug("omap_device: %s: activating\n", od->pdev.name);
+	if (strcmp(od->hwmods[0]->class->name, "uart"))
+		pr_debug("omap_device: %s: activating\n", od->pdev.name);

 	while (od->pm_lat_level > 0) {
 		struct omap_device_pm_latency *odpl;
@@ -138,25 +140,29 @@ static int _omap_device_activate(struct
omap_device *od, u8 ignore_lat)
 		c = timespec_sub(b, a);
 		act_lat = timespec_to_ns(&c);

-		pr_debug("omap_device: %s: pm_lat %d: activate: elapsed time "
-			 "%llu nsec\n", od->pdev.name, od->pm_lat_level,
-			 act_lat);
+		if (strcmp(od->hwmods[0]->class->name, "uart"))
+			pr_debug("omap_device: %s: pm_lat %d: activate: elapsed time "
+				 "%llu nsec\n", od->pdev.name, od->pm_lat_level,
+				 act_lat);

 		if (act_lat > odpl->activate_lat) {
 			odpl->activate_lat_worst = act_lat;
 			if (odpl->flags & OMAP_DEVICE_LATENCY_AUTO_ADJUST) {
 				odpl->activate_lat = act_lat;
-				pr_warning("omap_device: %s.%d: new worst case "
-					   "activate latency %d: %llu\n",
-					   od->pdev.name, od->pdev.id,
-					   od->pm_lat_level, act_lat);
-			} else
-				pr_warning("omap_device: %s.%d: activate "
-					   "latency %d higher than exptected. "
-					   "(%llu > %d)\n",
-					   od->pdev.name, od->pdev.id,
-					   od->pm_lat_level, act_lat,
-					   odpl->activate_lat);
+				if (strcmp(od->hwmods[0]->class->name, "uart"))
+					pr_warning("omap_device: %s.%d: new worst case "
+						"activate latency %d: %llu\n",
+						od->pdev.name, od->pdev.id,
+						od->pm_lat_level, act_lat);
+			} else {
+				if (strcmp(od->hwmods[0]->class->name, "uart"))
+					pr_warning("omap_device: %s.%d: activate "
+						"latency %d higher than exptected. "
+						"(%llu > %d)\n",
+						od->pdev.name, od->pdev.id,
+						od->pm_lat_level, act_lat,
+						odpl->activate_lat);
+			}
 		}

 		od->dev_wakeup_lat -= odpl->activate_lat;
@@ -183,7 +189,8 @@ static int _omap_device_deactivate(struct
omap_device *od, u8 ignore_lat)
 {
 	struct timespec a, b, c;

-	pr_debug("omap_device: %s: deactivating\n", od->pdev.name);
+	if (strcmp(od->hwmods[0]->class->name, "uart"))
+		pr_debug("omap_device: %s: deactivating\n", od->pdev.name);

 	while (od->pm_lat_level < od->pm_lats_cnt) {
 		struct omap_device_pm_latency *odpl;
@@ -206,25 +213,29 @@ static int _omap_device_deactivate(struct
omap_device *od, u8 ignore_lat)
 		c = timespec_sub(b, a);
 		deact_lat = timespec_to_ns(&c);

-		pr_debug("omap_device: %s: pm_lat %d: deactivate: elapsed time "
-			 "%llu nsec\n", od->pdev.name, od->pm_lat_level,
-			 deact_lat);
+		if (strcmp(od->hwmods[0]->class->name, "uart"))
+			pr_debug("omap_device: %s: pm_lat %d: deactivate: elapsed time "
+				 "%llu nsec\n", od->pdev.name, od->pm_lat_level,
+				 deact_lat);

 		if (deact_lat > odpl->deactivate_lat) {
 			odpl->deactivate_lat_worst = deact_lat;
 			if (odpl->flags & OMAP_DEVICE_LATENCY_AUTO_ADJUST) {
 				odpl->deactivate_lat = deact_lat;
-				pr_warning("omap_device: %s.%d: new worst case "
-					   "deactivate latency %d: %llu\n",
-					   od->pdev.name, od->pdev.id,
-					   od->pm_lat_level, deact_lat);
-			} else
-				pr_warning("omap_device: %s.%d: deactivate "
+				if (strcmp(od->hwmods[0]->class->name, "uart"))
+					pr_warning("omap_device: %s.%d: new worst case "
+						   "deactivate latency %d: %llu\n",
+						   od->pdev.name, od->pdev.id,
+						   od->pm_lat_level, deact_lat);
+			} else {
+				if (strcmp(od->hwmods[0]->class->name, "uart"))
+					pr_warning("omap_device: %s.%d: deactivate "
 					   "latency %d higher than exptected. "
 					   "(%llu > %d)\n",
 					   od->pdev.name, od->pdev.id,
 					   od->pm_lat_level, deact_lat,
 					   odpl->deactivate_lat);
+			}
 		}


<<SNIP>>


>
> either the above or something like:
>
> if (pm_runtime_suspended(dev))
>        return 0;
>
> on console_write() ??

Consider the scenario where after bootup uart is idled
with runtime auto suspend so now state of console
uart is RPM_SUSPENDED.

Now you connect a pendrive to ehci-port.

Missing critical usb host <-> device_enumeration prints?

uart_console rpm_suspended state ->

usb_device mass storage device found print ->

console_write -> return without printing.

--
Thanks,
Govindraj.R
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Arm (vger)]     [ARM Kernel]     [ARM MSM]     [Linux Tegra]     [Linux WPAN Networking]     [Linux Wireless Networking]     [Maemo Users]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux