Re: Shutdown regression due to mmc workqueue/locking changes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 30, 2013 at 2:05 PM, Stephen Warren <swarren@xxxxxxxxxxxxx> wrote:
> When I run "shutdown -h now" in next-20131030, I get a hung task.
>
> git bisect shows that the first problem that shows up is due to 35a8fca
> "mmc: sdhci: Turn tuning timeout timer into delayed work", which yields
> the following crash immediately that poweroff happens:
>
>> [  76.314514] BUG: scheduling while atomic: halt/1367/0x00000002
>> [  76.320614] Modules linked in:
>> [  76.323802] CPU: 2 PID: 1367 Comm: halt Not tainted 3.12.0-rc2-00141-g35a8fca #147
>> [  76.331682] [<c00158f0>] (unwind_backtrace+0x0/0xf8) from [<c00117a8>] (show_stack+0x10/0x14)
>> [  76.340501] [<c00117a8>] (show_stack+0x10/0x14) from [<c053a6a4>] (dump_stack+0x80/0xc4)
>> [  76.348860] [<c053a6a4>] (dump_stack+0x80/0xc4) from [<c0539034>] (__schedule_bug+0x48/0x60)
>> [  76.357559] [<c0539034>] (__schedule_bug+0x48/0x60) from [<c053eb8c>] (__schedule+0x64c/0x6d0)
>> [  76.366404] [<c053eb8c>] (__schedule+0x64c/0x6d0) from [<c053bb54>] (schedule_timeout+0x174/0x1ec)
>> [  76.375552] [<c053bb54>] (schedule_timeout+0x174/0x1ec) from [<c053e2d8>] (wait_for_common+0xb8/0x14c)
>> [  76.385074] [<c053e2d8>] (wait_for_common+0xb8/0x14c) from [<c0037ed4>] (__flush_work+0xa4/0x12c)
>> [  76.394078] [<c0037ed4>] (__flush_work+0xa4/0x12c) from [<c038677c>] (sdhci_do_set_ios+0x90/0x6fc)
>> [  76.403220] [<c038677c>] (sdhci_do_set_ios+0x90/0x6fc) from [<c0386e08>] (sdhci_set_ios+0x20/0x2c)
>> [  76.412380] [<c0386e08>] (sdhci_set_ios+0x20/0x2c) from [<c0374610>] (mmc_power_off+0x70/0x90)
>> [  76.421212] [<c0374610>] (mmc_power_off+0x70/0x90) from [<c03763a0>] (_mmc_suspend+0xb4/0x274)
>> [  76.430037] [<c03763a0>] (_mmc_suspend+0xb4/0x274) from [<c0375450>] (mmc_bus_shutdown+0x40/0x68)
>> [  76.439129] [<c0375450>] (mmc_bus_shutdown+0x40/0x68) from [<c029f324>] (device_shutdown+0x34/0x188)
>> [  76.448478] [<c029f324>] (device_shutdown+0x34/0x188) from [<c0044fcc>] (kernel_power_off+0x2c/0x70)
>> [  76.457814] [<c0044fcc>] (kernel_power_off+0x2c/0x70) from [<c00452b8>] (SyS_reboot+0x108/0x1d4)
>> [  76.466812] [<c00452b8>] (SyS_reboot+0x108/0x1d4) from [<c000e5e0>] (ret_fast_syscall+0x0/0x30)
>
> A few commits later, 769303b "mmc: sdhci: Turn host->lock into a mutex"
> changes this to a hang, followed by a hung task being detected:
>
>> [ 240.870402] INFO: task kworker/1:1:70 blocked for more than 120 seconds.
>> [ 240.877203]      Not tainted 3.12.0-rc7-next-20131030 #133
>> [ 240.882886] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 240.890824] kworker/1:1    D c0554ec0    0   70     2 0x00000000
>> [ 240.897345] Workqueue: events sdhci_tuning_timeout_work
>> [ 240.902734] [<c0554ec0>] (__schedule+0x300/0x674) from [<c0555614>] (schedule_preempt_disabled+0x24/0x34)
>> [ 240.912461] [<c0555614>] (schedule_preempt_disabled+0x24/0x34) from [<c0554034>] (__mutex_lock_slowpath+0x17c/0x374)
>> [ 240.923146] [<c0554034>] (__mutex_lock_slowpath+0x17c/0x374) from [<c0554238>] (mutex_lock+0xc/0x24)
>> [ 240.932430] [<c0554238>] (mutex_lock+0xc/0x24) from [<c0393614>] (sdhci_tuning_timeout_work+0x14/0x2c)
>> [ 240.941890] [<c0393614>] (sdhci_tuning_timeout_work+0x14/0x2c) from [<c00398b4>] (process_one_work+0xfc/0x34c)
>> [ 240.952033] [<c00398b4>] (process_one_work+0xfc/0x34c) from [<c0039b28>] (process_scheduled_works+0x24/0x34)
>> [ 240.961998] [<c0039b28>] (process_scheduled_works+0x24/0x34) from [<c0039ff4>] (worker_thread+0x228/0x384)
>> [ 240.971787] [<c0039ff4>] (worker_thread+0x228/0x384) from [<c003f748>] (kthread+0xc4/0xe0)
>> [ 240.980160] [<c003f748>] (kthread+0xc4/0xe0) from [<c000e6f8>] (ret_from_fork+0x14/0x3c)
>> [ 240.988366] INFO: task halt:1272 blocked for more than 120 seconds.
>> [ 240.994724]      Not tainted 3.12.0-rc7-next-20131030 #133
>> [ 241.000373] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 241.008273] halt           D c0554ec0    0 1272  1270 0x00000000
>> [ 241.014820] [<c0554ec0>] (__schedule+0x300/0x674) from [<c05521e4>] (schedule_timeout+0x174/0x1ec)
>> [ 241.023919] [<c05521e4>] (schedule_timeout+0x174/0x1ec) from [<c0554958>] (wait_for_common+0xb8/0x14c)
>> [ 241.033359] [<c0554958>] (wait_for_common+0xb8/0x14c) from [<c003839c>] (__flush_work+0xa4/0x12c)
>> [ 241.042362] [<c003839c>] (__flush_work+0xa4/0x12c) from [<c0396c4c>] (sdhci_do_set_ios+0x8c/0x6e8)
>> [ 241.051448] [<c0396c4c>] (sdhci_do_set_ios+0x8c/0x6e8) from [<c03972c8>] (sdhci_set_ios+0x20/0x2c)
>> [ 241.060553] [<c03972c8>] (sdhci_set_ios+0x20/0x2c) from [<c03846c0>] (mmc_power_off+0x5c/0x7c)
>> [ 241.069275] [<c03846c0>] (mmc_power_off+0x5c/0x7c) from [<c0386634>] (_mmc_suspend+0x194/0x2a0)
>> [ 241.078110] [<c0386634>] (_mmc_suspend+0x194/0x2a0) from [<c03855e8>] (mmc_bus_shutdown+0x40/0x68)
>> [ 241.087339] [<c03855e8>] (mmc_bus_shutdown+0x40/0x68) from [<c02acd38>] (device_shutdown+0x34/0x188)
>> [ 241.096627] [<c02acd38>] (device_shutdown+0x34/0x188) from [<c0044e94>] (kernel_power_off+0x2c/0x70)
>> [ 241.105893] [<c0044e94>] (kernel_power_off+0x2c/0x70) from [<c0045180>] (SyS_reboot+0x108/0x1d4)
>> [ 241.114814] [<c0045180>] (SyS_reboot+0x108/0x1d4) from [<c000e660>] (ret_fast_syscall+0x0/0x30)
>
> This is on the Tegra114 Dalmore board, which is an ARM system. Looking
> at the backtraces, I'd guess that all Tegra devices are affected, but
> haven't tested that. I haven't looked at the commits to see if there's
> anything obviously wrong, but perhaps you can suggest something to try?

I'm not seeing it on Trimslice or Beaver, but I did hit another
regression by the same patch on dove:

http://marc.info/?l=linux-mmc&m=138316145408963


-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-next" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Linux USB Development]     [Yosemite News]     [Linux SCSI]

  Powered by Linux