Re: [PATCH 0/2] USB: musb: PM fixes

Johan Hovold <johan@xxxxxxxxxx> · Thu, 7 Sep 2017 14:04:29 +0200

On Tue, Sep 05, 2017 at 04:21:10PM +0200, Johan Hovold wrote:
> These patches fix a couple of bugs introduced by the recent runtime-PM
> work. 
> 
> Note that the external abort was due to the irq work never being flushed
> on suspend, and that we may need similar fixes for the delayed reset and
> resume work which are likewise never cancelled on suspend.

Looks like there are even more issues with musb suspend.

With this series, which allows the controller to runtime suspend upon
system resume, I can now trigger the following external abort at resume:

PM: Finishing wakeup.
OOM killer enabled.
Restarting tasks ... done.
hrtimer: interrupt took 191917 ns
Unhandled fault: external abort on non-linefetch (0x1008) at 0xc8249412
pgd = c0004000
[c8249412] *pgd=87350811, *pte=47401653, *ppte=47401453
Internal error: : 1008 [#1] PREEMPT ARM
Modules linked in:
CPU: 0 PID: 572 Comm: kworker/0:2 Not tainted 4.12.0 #34
Hardware name: Generic AM33XX (Flattened Device Tree)
Workqueue: pm pm_runtime_work
task: c72057c0 task.stack: c722a000
PC is at musb_default_readw+0x4/0x10
LR is at musb_is_tx_fifo_empty+0x3c/0x48
<snip>
[<c03ce444>] (musb_default_readw) from [<c03d4f68>] (musb_is_tx_fifo_empty+0x3c/0x48)
[<c03d4f68>] (musb_is_tx_fifo_empty) from [<c03d5880>] (cppi41_recheck_tx_req+0x5c/0x118)
[<c03d5880>] (cppi41_recheck_tx_req) from [<c016caf8>] (__hrtimer_run_queues.constprop.4+0x110/0x1bc)
[<c016caf8>] (__hrtimer_run_queues.constprop.4) from [<c016cfa4>] (hrtimer_interrupt+0x98/0x230)
[<c016cfa4>] (hrtimer_interrupt) from [<c0114018>] (omap2_gp_timer_interrupt+0x28/0x30)
[<c0114018>] (omap2_gp_timer_interrupt) from [<c015bc08>] (__handle_irq_event_percpu+0x88/0x124)
[<c015bc08>] (__handle_irq_event_percpu) from [<c015bcc0>] (handle_irq_event_percpu+0x1c/0x58)
[<c015bcc0>] (handle_irq_event_percpu) from [<c015bd48>] (handle_irq_event+0x4c/0x84)
[<c015bd48>] (handle_irq_event) from [<c015ebd8>] (handle_level_irq+0xb0/0x15c)
[<c015ebd8>] (handle_level_irq) from [<c015af34>] (generic_handle_irq+0x24/0x34)
[<c015af34>] (generic_handle_irq) from [<c015b4c0>] (__handle_domain_irq+0x70/0xdc)
[<c015b4c0>] (__handle_domain_irq) from [<c010c20c>] (__irq_svc+0x6c/0xa8)
[<c010c20c>] (__irq_svc) from [<c01168f4>] (omap_hwmod_idle+0x30/0x74)
[<c01168f4>] (omap_hwmod_idle) from [<c0117cb8>] (omap_device_idle+0x40/0x90)
[<c0117cb8>] (omap_device_idle) from [<c0360f88>] (__rpm_callback+0x15c/0x258)
[<c0360f88>] (__rpm_callback) from [<c03610d4>] (rpm_callback+0x50/0x80)
[<c03610d4>] (rpm_callback) from [<c0360000>] (rpm_suspend+0xe0/0x548)
[<c0360000>] (rpm_suspend) from [<c036199c>] (pm_runtime_work+0xac/0xbc)
[<c036199c>] (pm_runtime_work) from [<c013c0c0>] (process_one_work+0x11c/0x350)
[<c013c0c0>] (process_one_work) from [<c013c32c>] (worker_thread+0x38/0x55c)
[<c013c32c>] (worker_thread) from [<c0141a00>] (kthread+0x100/0x130)
[<c0141a00>] (kthread) from [<c0108418>] (ret_from_fork+0x14/0x3c)

after having suspended with an active ECM gadget.

Turns out system suspend breaks musb in gadget mode. It seems I need to
manually restart the gadget to get it to work again even it had just
been enumerated (and which does not trigger the above crash). (Bug 1)

But if an ECM gadget is also active (e.g. open SSH session) when
suspending, this in turn can trigger yet another bug in that the
early_tx dma-irq hrtimer is never cancelled when the tx-fifo does not
empty when the gadget driver initiates a transfer after resume. The
early_tx timer keeps rescheduling itself until the gadget it stopped
manually (keeping the BBB CPU busy at about 20-30%). (Bug 2)

If the controller is allowed to runtime suspend after system resume, as
with this series, this repeated scheduling triggers the above external
abort.

I've respun the series so that the session flag and runtime pm count is
left untouched unless we've already started the session-quirk timeout
handling.

This avoids the above crash, but does not address another problem with
the current code, namely that the controller is left active in case a
device is disconnected while suspended in host mode. (Bug 3)

Johan
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html